From charlesr.harris at gmail.com Thu Oct 1 02:16:13 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 1 Oct 2009 00:16:13 -0600 Subject: [Numpy-discussion] More guestions on Chebyshev class. Message-ID: The Chebyshev class is now working pretty well, but I would like to settle some things up front. 1) Order in which coefficients are stored/passed/accessed. The current poly1d class ctor is called with the coefficients in high to low order, yet the __getitem__ and __setitem__ methods access them in reverse order. This seems confusing and I think both should go in the same order and my preference would be from low to high. The low to high order also works a bit better for implementation. 2) poly1d allows the size of the coefficient array to be dynamically extended. I have mixed feeling about that and would prefer not, but there are arguments for that: students might find it easier to fool with. 3) The poly1d class prunes leading (high power) zeros. Because the Cheb class has a fit static method that returns a Cheb object, and because when fitting with Chebyshev polynomials the user often wants to see *all* of the coefficients, even if some of the leading ones are zero, the Cheb class does not automatically prune the zeros, instead there are methods for that. 4) All the attributes of the Cheb class are read/write. The poly1d class attempts to hide some, but the method used breaks the copy module. Python really doesn't have private attributes, so I left all the attributes exposed with the usual Python proviso: if you don't know what it does, don't fool with it. 5) Is Cheb the proper name for the class? Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Oct 1 02:23:45 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 1 Oct 2009 00:23:45 -0600 Subject: [Numpy-discussion] repr and object arrays In-Reply-To: <3d375d730909301952u35fd0475pab12f21ca4baf720@mail.gmail.com> References: <3d375d730909301952u35fd0475pab12f21ca4baf720@mail.gmail.com> Message-ID: On Wed, Sep 30, 2009 at 8:52 PM, Robert Kern wrote: > On Wed, Sep 30, 2009 at 21:45, Charles R Harris > wrote: > > Hi All, > > > > It seems that repr applied do an object array does not provide the info > > needed to recreate it: > > > > In [22]: y = array([Decimal(1)]*2) > > > > In [23]: repr(y) > > Out[23]: 'array([1, 1], dtype=object)' > > > > And of course, there is going to be a problem with arrays of more than > one > > dimension anyway. But I wonder if this should be fixed? > > Using repr() instead of str() for the items would probably be wise. > > OK, I'll open a ticket for it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav+sp at iki.fi Thu Oct 1 02:55:17 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Thu, 1 Oct 2009 06:55:17 +0000 (UTC) Subject: [Numpy-discussion] ufunc and errors References: <4AC36C79.1030202@student.matnat.uio.no> <3d375d730909300833o7b2db961m45d9ad26575955cf@mail.gmail.com> Message-ID: Wed, 30 Sep 2009 10:33:46 -0500, Robert Kern wrote: [clip] >> Also, will the arguments always be named x1, x2, x3, ..., or can I >> somehow give them custom names? > > The only place where names appear is in the docstring. Write whatever > text you like. The first line of the docstring is generated by Numpy and cannot be modified. -- Pauli Virtanen From nadavh at visionsense.com Thu Oct 1 06:46:13 2009 From: nadavh at visionsense.com (Nadav Horesh) Date: Thu, 1 Oct 2009 12:46:13 +0200 Subject: [Numpy-discussion] Weird behaviour of scipy.signal.sepfir2d Message-ID: <710F2847B0018641891D9A21602763605AD196@ex3.envision.co.il> This function function often result in incorrect output when the cpu is very loaded. I do not know how to trace the bug since every "single shot" use, or step by step trace gives the correct answer, also when running scripts under "ipython -pdb" solves the problem. System: >>> numpy.__version__ '1.4.0.dev7400' >>> scipy.__version__ '0.8.0.dev5922' Python 2.6.2 (64 bits) numpy/scipy are built with atlas support. OS: Gentoo linux (I use two independent (not a clone of each other) machines). Sorry for posting it on the numpy list, but anyway this is not the first cross-lists post. Nadav From ralf.gommers at googlemail.com Thu Oct 1 12:19:01 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 1 Oct 2009 12:19:01 -0400 Subject: [Numpy-discussion] merging docs from wiki In-Reply-To: References: Message-ID: On Sun, Sep 20, 2009 at 8:59 PM, Ralf Gommers wrote: > > > On Sun, Sep 20, 2009 at 3:49 PM, Ralf Gommers > wrote: > >> Hi, >> >> I'm done reviewing all the improved docstrings for NumPy, they can be >> merged now from the doc editor Patch page. Maybe I'll get around to doing >> the SciPy ones as well this week, but I can't promise that. >> >> Actually, scipy was a lot less work. Please merge that too. > > Sorry to ask again, but it would really be very useful to get those docstrings merged for both scipy and numpy. The scipy docs merge cleanly, anyone with commit access can do it, like this: 1. Go to http://docs.scipy.org/scipy/patch/ and log in. 2. click on "Select OK to apply" 3. click on "Generate patch" 4. select all the text in the browser and save as a patch. 5. apply patch, commit For numpy in principle the same procedure, except there are some objects that need the add_newdocs treatment. There are two types of errors, my question is (mainly to Pauli) if they both need the same treatment or a different one. Errors: 1. source location not known, like: ERROR: numpy.broadcast.next: source location for docstring is not known 2. source location known but failed to find a place to add docstrings, like: ERROR: Source location for numpy.lib.function_base.iterable known, but failed to find a place for the docstring Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdroe at stsci.edu Thu Oct 1 12:26:29 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Thu, 01 Oct 2009 12:26:29 -0400 Subject: [Numpy-discussion] [SciPy-dev] Deprecate chararray [was Plea for help] In-Reply-To: <45d1ab480909301433q5ff65872o1f2edc12047d0425@mail.gmail.com> References: <857977.74958.qm@web52106.mail.re2.yahoo.com> <4ABCC0A1.30402@stsci.edu> <45d1ab480909250923t2dd716a2s502967fb231dd889@mail.gmail.com> <4AC24A0C.4090005@stsci.edu> <45d1ab480909291248j107047fld5da82eb607d5614@mail.gmail.com> <4AC3571F.6020901@stsci.edu> <4AC36924.7020605@stsci.edu> <45d1ab480909301337s3fef8564uce77ebbbba8978da@mail.gmail.com> <45d1ab480909301433q5ff65872o1f2edc12047d0425@mail.gmail.com> Message-ID: <4AC4D835.4050303@stsci.edu> Thanks. There's a bit of a snag getting my SVN permissions, and the doc editor permissions I assume are pending. Once those things are in place, I can move forward on this and hopefully it will be clear what needs to be done. Mike David Goldsmith wrote: > On Wed, Sep 30, 2009 at 2:09 PM, Ralf Gommers > > wrote: > > > On Wed, Sep 30, 2009 at 4:37 PM, David Goldsmith > > wrote: > > So, Ralf (or anyone), how, if at all, should we modify the > status of the existing chararray objects/methods in the wiki? > > > Nothing has to be done until *after* Mike has committed his > changes to svn. Please see my previous email for what has to > happen at that point. Since Mike wrote the new docstrings it would > be best if he updated the status of the wiki pages then. > > > OK; Mike: hopefully it will be clear what you have to do to update the > status (it's pretty trivial) but of course don't hesitate to email > (you can do so off-list if you prefer) w/ any questions; > unfortunately, AFAIK, there's no way to update the status of many > docstrings all at once - you'll have to do them each individually (if > you like, let me know when you've committed them and I can help - it > sounds like there will be a lot); the main "silly" thing to remember > is that the option to change the "Review status" only appears if > you're logged in. :-) > > > Assuming you have no problem sharing them with me, Michael, > I could add those docstrings you created for the existing methods, > > > They will show up in the wiki when they get committed to svn > (presumably within a few days), so this is needless effort for the > most part. If there are different changes in the wiki and svn, > that will show up in the "merge" page. > > The ony thing that requires manual effort is if there are changes > in the wiki and the object got moved in svn. > > > And, as above, updating the status in the Wiki. :-) > > DG > > > Cheers, > Ralf > > > > DG > > > On Wed, Sep 30, 2009 at 7:20 AM, Michael Droettboom > > wrote: > > Ralf Gommers wrote: > > > > > > On Wed, Sep 30, 2009 at 9:03 AM, Michael Droettboom > > > >> wrote: > > > > In the source in my working copy. Is that going to > cause problems? I > > wasn't sure if it was possible to document methods > that didn't yet > > exist > > in the code in the wiki. > > > > That is fine. New functions will automatically show up > in the wiki. It > > would be helpful though if you could mark them ready for > review in the > > wiki (if they are) after they show up. Could take up to > 24 hours for > > svn changes to propagate. > Thanks. Will do. > > > > Only if you moved functions around it would be useful if > you pinged > > Pauli after you committed them. This is a temporary > problem, right now > > the wiki creates a new page for a moved object, and the > old content > > (if any) has to be copied over to the new page. > All of the functions that were moved were previously > without docstrings > in SVN, though some had docstrings (that I just now > discovered) in the > wiki. This may cause some hiccups, I suppose, so I'll be > sure to > announce when these things get committed to SVN so I know > how to help > straighten these things out. > > Mike > > > > Cheers, > > Ralf > > > > > > Mike > > > > David Goldsmith wrote: > > > On Tue, Sep 29, 2009 at 10:55 AM, Michael Droettboom > > > > > > > > >>> wrote: > > > > > > 2) Improve documentation > > > > > > Every method now has a docstring, and a new > page of routines > > has been > > > added to the Sphinx tree. > > > > > > > > > Um, where did you do this, 'cause it's not showing > up in the doc > > wiki. > > > > > > DG > > > > > > ------------------------------------------------------------------------ > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > -- > > Michael Droettboom > > Science Software Branch > > Operations and Engineering Division > > Space Telescope Science Institute > > Operated by AURA for NASA > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Michael Droettboom > Science Software Branch > Operations and Engineering Division > Space Telescope Science Institute > Operated by AURA for NASA > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ------------------------------------------------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From millman at berkeley.edu Thu Oct 1 12:32:06 2009 From: millman at berkeley.edu (Jarrod Millman) Date: Thu, 1 Oct 2009 09:32:06 -0700 Subject: [Numpy-discussion] merging docs from wiki In-Reply-To: References: Message-ID: On Thu, Oct 1, 2009 at 9:19 AM, Ralf Gommers wrote: > Sorry to ask again, but it would really be very useful to get those > docstrings merged for both scipy and numpy. I will do this now. Jarrod From ralf.gommers at googlemail.com Thu Oct 1 12:35:04 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 1 Oct 2009 12:35:04 -0400 Subject: [Numpy-discussion] merging docs from wiki In-Reply-To: References: Message-ID: On Thu, Oct 1, 2009 at 12:32 PM, Jarrod Millman wrote: > On Thu, Oct 1, 2009 at 9:19 AM, Ralf Gommers > wrote: > > Sorry to ask again, but it would really be very useful to get those > > docstrings merged for both scipy and numpy. > > I will do this now. > Jarrod > Thanks a lot! Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pfeldman at verizon.net Thu Oct 1 12:55:50 2009 From: pfeldman at verizon.net (Dr. Phillip M. Feldman) Date: Thu, 1 Oct 2009 09:55:50 -0700 (PDT) Subject: [Numpy-discussion] difficulty with numpy.where Message-ID: <25702676.post@talk.nabble.com> I've defined the following one-line function that uses numpy.where: def sin_half_period(x): return where(0.0 <= x <= pi, sin(x), 0.0) When I try to use this function, I get an error message: In [4]: z=linspace(0,2*pi,9) In [5]: sin_half_period(z) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) The truth value of an array with more than one element is ambiguous. Use a.any () or a.all() Any suggestions will be appreciated. -- View this message in context: http://www.nabble.com/difficulty-with-numpy.where-tp25702676p25702676.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From kwgoodman at gmail.com Thu Oct 1 13:00:00 2009 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 1 Oct 2009 10:00:00 -0700 Subject: [Numpy-discussion] difficulty with numpy.where In-Reply-To: <25702676.post@talk.nabble.com> References: <25702676.post@talk.nabble.com> Message-ID: On Thu, Oct 1, 2009 at 9:55 AM, Dr. Phillip M. Feldman wrote: > > I've defined the following one-line function that uses numpy.where: > > def sin_half_period(x): return where(0.0 <= x <= pi, sin(x), 0.0) > > When I try to use this function, I get an error message: > > In [4]: z=linspace(0,2*pi,9) > In [5]: sin_half_period(z) > --------------------------------------------------------------------------- > ValueError ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call last) > > The truth value of an array with more than one element is ambiguous. Use > a.any > () or a.all() > > Any suggestions will be appreciated. Take a look at this thread: http://www.nabble.com/Compound-conditional-indexing-td25686443.html From zachary.pincus at yale.edu Thu Oct 1 13:10:08 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 1 Oct 2009 13:10:08 -0400 Subject: [Numpy-discussion] difficulty with numpy.where In-Reply-To: <25702676.post@talk.nabble.com> References: <25702676.post@talk.nabble.com> Message-ID: <06FE4761-5930-4160-A90A-88AD15FAE71D@yale.edu> Hello, a < b < c (or any equivalent expression) is python syntactic sugar for (a < b) and (b < c). Now, for numpy arrays, a < b gives an array with boolean True or False where the elements of a are less than those of b. So this gives us two arrays that python now wants to "and" together. To do this, python tries to convert the array "a < b" to a single True or False value, and the array "b < c" to a single True or False value, which it then knows how to "and" together. Except that "a < b" could contain many True or False elements, so how to convert them to a single one? There's no obvious way to guess -- typically, one uses "any" or "all" to convert a boolean array to a single true or false value, depending, obviously, on what one needs. So this explains the error you see, but has nothing to do with the results you desire... you need to and-together two boolean arrays *element-wise* -- which is something Python doesn't know how to do with the builtin "and" operator (which cannot be overridden). To do this, you need to use the bitwise logic operators: (a < b) & (b < c). So: def sin_half_period(x): return where((0.0 <= x) & (x <= pi), sin(x), 0.0) Zach On Oct 1, 2009, at 12:55 PM, Dr. Phillip M. Feldman wrote: > > I've defined the following one-line function that uses numpy.where: > > def sin_half_period(x): return where(0.0 <= x <= pi, sin(x), 0.0) > > When I try to use this function, I get an error message: > > In [4]: z=linspace(0,2*pi,9) > In [5]: sin_half_period(z) > --------------------------------------------------------------------------- > ValueError Traceback (most recent > call last) > > The truth value of an array with more than one element is ambiguous. > Use > a.any > () or a.all() > > Any suggestions will be appreciated. > -- > View this message in context: http://www.nabble.com/difficulty-with-numpy.where-tp25702676p25702676.html > Sent from the Numpy-discussion mailing list archive at Nabble.com. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From gokhansever at gmail.com Thu Oct 1 13:48:33 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Thu, 1 Oct 2009 12:48:33 -0500 Subject: [Numpy-discussion] difficulty with numpy.where In-Reply-To: <06FE4761-5930-4160-A90A-88AD15FAE71D@yale.edu> References: <25702676.post@talk.nabble.com> <06FE4761-5930-4160-A90A-88AD15FAE71D@yale.edu> Message-ID: <49d6b3500910011048o63a1b593qac1bdea0e69a6733@mail.gmail.com> On Thu, Oct 1, 2009 at 12:10 PM, Zachary Pincus wrote: > Hello, > > a < b < c (or any equivalent expression) is python syntactic sugar for > (a < b) and (b < c). > > Now, for numpy arrays, a < b gives an array with boolean True or False > where the elements of a are less than those of b. So this gives us two > arrays that python now wants to "and" together. To do this, python > tries to convert the array "a < b" to a single True or False value, > and the array "b < c" to a single True or False value, which it then > knows how to "and" together. Except that "a < b" could contain many > True or False elements, so how to convert them to a single one? > There's no obvious way to guess -- typically, one uses "any" or "all" > to convert a boolean array to a single true or false value, depending, > obviously, on what one needs. > > So this explains the error you see, but has nothing to do with the > results you desire... you need to and-together two boolean arrays > *element-wise* -- which is something Python doesn't know how to do > with the builtin "and" operator (which cannot be overridden). To do > this, you need to use the bitwise logic operators: > (a < b) & (b < c). > > So: > > def sin_half_period(x): return where((0.0 <= x) & (x <= pi), sin(x), > 0.0) > > Zach > > > Very well expressed Zach. The reason that I wanted use this kind of conditional indexing is as follows: I have a dataset with a main time-variable and various other measurement results including some atmospheric data (cloud microphysics in particular). In one instance of this dataset I have 8000 something rows for each of the variables in the file. We wanted to segment cloud droplet concentration data only for some certain time-window (only if a measurement was done at cloud base conditions.) We have a-priori knowledge for this time-window, the only other thing to do is conditionally indexing our cloud drop concentration with this window. Putting in more technical terms: time = 40000 to 48000 a numpy array conc = 300 to 500 numpy array with 8000 elements. say that cloud bases occur in 45000 and 45400, and I am only interested analysing that portion of the data. Do a boxplot or even being fancier and making violing plots out this section :) So I do: conc[(time>45000) & (time<45400)] Voila! > > On Oct 1, 2009, at 12:55 PM, Dr. Phillip M. Feldman wrote: > > > > > I've defined the following one-line function that uses numpy.where: > > > > def sin_half_period(x): return where(0.0 <= x <= pi, sin(x), 0.0) > > > > When I try to use this function, I get an error message: > > > > In [4]: z=linspace(0,2*pi,9) > > In [5]: sin_half_period(z) > > > --------------------------------------------------------------------------- > > ValueError Traceback (most recent > > call last) > > > > The truth value of an array with more than one element is ambiguous. > > Use > > a.any > > () or a.all() > > > > Any suggestions will be appreciated. > > -- > > View this message in context: > http://www.nabble.com/difficulty-with-numpy.where-tp25702676p25702676.html > > Sent from the Numpy-discussion mailing list archive at Nabble.com. > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From millman at berkeley.edu Thu Oct 1 14:26:56 2009 From: millman at berkeley.edu (Jarrod Millman) Date: Thu, 1 Oct 2009 11:26:56 -0700 Subject: [Numpy-discussion] merging docs from wiki In-Reply-To: References: Message-ID: OK, I've checked in the scipy doc improvements: http://projects.scipy.org/scipy/changeset/5954 http://projects.scipy.org/scipy/changeset/5955 Thanks to everyone who contributed! I will merge the numpy docs later today. Is there anything else I need to do on the SciPy documentation editor to indicate that I've merged the changes or will it update itself. Best, Jarrod From Klaus.Noekel at gmx.de Thu Oct 1 14:43:36 2009 From: Klaus.Noekel at gmx.de (Klaus Noekel) Date: Thu, 01 Oct 2009 20:43:36 +0200 Subject: [Numpy-discussion] Windows 64-bit Message-ID: <4AC4F858.3000108@gmx.de> Hi all, at the end of July David answered my question about future 64-bit Windows support as follows: "There were some discussion about pushing 1.4.0 'early', but instead, I think we let it slipped - one consequence is that there will be enough time for 1.4.0 to be released with proper AMD64 support on windows. The real issue is not numpy per-se, but making scipy work on top of numpy in 64 bits mode. It is hard to give an exact date as to when those issues will be fixed, but it is being worked on." As our project needs 64-bit numpy under Windows quite soon, I am curious about the state of the project: - Is a stable 64-bit Windows installer (with or without numpy 1.4.0) going to be released anytime SOON? - We need only numpy, not scipy. Does that imply that we have a good chance of producing an install ourselves with the current sources? I am a bit concerned, because earlier posts indicated that the issue is not trivial. Or have all the hard aspects to do with scipy? Thanks for an update! Cheers, Klaus N?kel From ralf.gommers at googlemail.com Thu Oct 1 15:20:51 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 1 Oct 2009 15:20:51 -0400 Subject: [Numpy-discussion] merging docs from wiki In-Reply-To: References: Message-ID: On Thu, Oct 1, 2009 at 2:26 PM, Jarrod Millman wrote: > OK, I've checked in the scipy doc improvements: > http://projects.scipy.org/scipy/changeset/5954 > http://projects.scipy.org/scipy/changeset/5955 > > Thanks again Jarrod! > Thanks to everyone who contributed! I will merge the numpy docs later > today. > > Is there anything else I need to do on the SciPy documentation editor > to indicate that I've merged the changes or will it update itself. > That should be all I think. Changes should show up in the wiki within a day, at which point the "diff to svn" is empty and the updated docstrings should disappear from the patch page. Cheers, Ralf > Best, > Jarrod > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rowen at uw.edu Thu Oct 1 15:55:19 2009 From: rowen at uw.edu (Russell E. Owen) Date: Thu, 01 Oct 2009 12:55:19 -0700 Subject: [Numpy-discussion] More guestions on Chebyshev class. References: Message-ID: In article , Charles R Harris wrote: > The Chebyshev class is now working pretty well, but I would like to settle > some things up front. > > 1) Order in which coefficients are stored/passed/accessed. > > The current poly1d class ctor is called with the coefficients in high to low > order, yet the __getitem__ and __setitem__ methods access them in reverse > order. This seems confusing and I think both should go in the same order and > my preference would be from low to high. The low to high order also works a > bit better for implementation. This sounds like a very useful change to me. > 2) poly1d allows the size of the coefficient array to be dynamically > extended. I have mixed feeling about that and would prefer not, but there > are arguments for that: students might find it easier to fool with. If it's easy to make a new instance that copies the old cofficients and allow the user to add new ones or trim some high order terms, then surely that suffices and you need not support resizing the coefficient array? > 3) The poly1d class prunes leading (high power) zeros. Because the Cheb > class has a fit static method that returns a Cheb object, and because when > fitting with Chebyshev polynomials the user often wants to see *all* of the > coefficients, even if some of the leading ones are zero, the Cheb class does > not automatically prune the zeros, instead there are methods for that. Will this be affected if you list the coefficients low to high, as you recommend in (1) or make the coefficient list not resizable as per (2)? Certainly it seems much safer to elide trailing zeros, rather than leading zeros. In any case, I agree with you that manually trimming sounds safer than than automatically trimming. > 4) All the attributes of the Cheb class are read/write. The poly1d class > attempts to hide some, but the method used breaks the copy module. Python > really doesn't have private attributes, so I left all the attributes exposed > with the usual Python proviso: if you don't know what it does, don't fool > with it. > > 5) Is Cheb the proper name for the class? I suggest spelling it out: Chebyshev. Explicit is better than implicit and it doesn't save that much typing. (Failing that, I suggest at least including the Y -- I think Cheby is clearer than Cheb). -- Russell From cournape at gmail.com Thu Oct 1 20:33:32 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 2 Oct 2009 09:33:32 +0900 Subject: [Numpy-discussion] Windows 64-bit In-Reply-To: <4AC4F858.3000108@gmx.de> References: <4AC4F858.3000108@gmx.de> Message-ID: <5b8d13220910011733w221f65ata82381a0bedaaa9@mail.gmail.com> On Fri, Oct 2, 2009 at 3:43 AM, Klaus Noekel wrote: > - We need only numpy, not scipy. Does that imply that we have a good > chance of producing an install ourselves with the current sources? The current sources can be compiled by visual studio in 64 bits mode without problem and should be quite stsable- you won't have a fast blas/lapack, though, David From bsouthey at gmail.com Fri Oct 2 10:34:39 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 02 Oct 2009 09:34:39 -0500 Subject: [Numpy-discussion] Question about improving genfromtxt errors In-Reply-To: References: <4AC0E740.60309@noaa.gov> <4AC237BA.9050104@noaa.gov> <4AC24A7F.4020904@gmail.com> <00CD2A47-A721-4A46-8D81-203BE795A6E4@gmail.com> <4AC26FEB.60401@gmail.com> <4AC38DD4.9000609@gmail.com> Message-ID: <4AC60F7F.1050806@gmail.com> On 09/30/2009 12:44 PM, Skipper Seabold wrote: > On Wed, Sep 30, 2009 at 12:56 PM, Bruce Southey wrote: > >> On 09/30/2009 10:22 AM, Skipper Seabold wrote: >> >>> On Tue, Sep 29, 2009 at 4:36 PM, Bruce Southey wrote: >>> >>> >>> >>>> Hi, >>>> The first case just has to handle a missing delimiter - actually I expect >>>> that most of my cases would relate this. So here is simple Python code to >>>> generate arbitrary large list with the occasional missing delimiter. >>>> >>>> I set it so it reads the desired number of rows and frequency of bad rows >>>> from the linux command line. >>>> $time python tbig.py 1000000 100000 >>>> >>>> If I comment out the extra prints in io.py that I put in, it takes about 22 >>>> seconds to finish if the delimiters are correct. If I have the missing >>>> delimiter it takes 20.5 seconds to crash. >>>> >>>> >>>> Bruce >>>> >>>> >>>> >>> I think this would actually cover most of the problems I was running >>> into. The only other one I can think of is when I used a converter >>> that I thought would work, but it got unexpected data. For example, >>> >>> from StringIO import StringIO >>> import numpy as np >>> >>> strip_rand = lambda x : float(('r' in x.lower() and x.split()[-1]) or >>> (not 'r' in x.lower() and x.strip() or 0.0)) >>> >>> # Example usage >>> strip_rand('R 40') >>> strip_rand(' ') >>> strip_rand('') >>> strip_rand('40') >>> >>> strip_per = lambda x : float(('%' in x.lower() and x.split()[0]) or >>> (not '%' in x.lower() and x.strip() or 0.0)) >>> >>> # Example usage >>> strip_per('7 %') >>> strip_per('7') >>> strip_per(' ') >>> strip_per('') >>> >>> # Unexpected usage >>> strip_per('R 1') >>> >>> >> Does this work for you? >> I get an: >> ValueError: invalid literal for float(): R 1 >> >> > No, that's the idea. Sorry this was a bit opaque. > > >> >>> s = StringIO('D01N01,10/1/2003 ,1 %,R 75,400,600\r\nL24U05,12/5/2003\ >>> ,2 %,1,300, 150.5\r\nD02N03,10/10/2004 ,R 1,,7,145.55') >>> >>> >> Can you provide the correct line before the bad line? >> It just makes it easy to understand why a line is bad. >> >> > The idea is that I have a column, which I expect to be percentages, > but these are coded in by different data collectors, so some code a 0 > for 0, some just leave it missing which could just as well be 0, some > use the %. What I didn't expect was that some put in a money amount, > hence the 'R 7', which my converter doesn't catch. > > >>> data = np.genfromtxt(s, converters = {2 : strip_per, 3 : strip_rand}, >>> delimiter=",", dtype=None) >>> >>> I don't have a clean install right now, but I think this returned a >>> converter is locked for upgrading error. I would just like to know >>> where the problem occured (line and column, preferably not >>> zero-indexed), so I can go and have a look at my data. >>> >>> >> I rather limited understanding here. I think the problem is that Python >> is raising a ValueError because your strip_per() is wrong. It is not >> informative to you because _iotools.py is not aware that an invalid >> converter will raise a ValueError. Therefore there needs to be some way >> to test that the converter is correct or not. >> >> > _iotools does catch this I believe, though I don't understand the > upgrading and locking properly. The kludgy fix that I provided in the > first post "I do not report the error from > _iotools.StringConverter...", catches that an error is raised from > _iotools and tells me exactly where the converter fails, so I can go > to, say line 750,000 column 250 (and converter with key 249) instead > of not knowing anything except that one of my ~500 converters failed > somewhere in a 1 million line data file. If you still want to keep > the error messages from _iotools.StringConverter, then they maybe they > could have a (%s, %s) added and then this can be filled in in > genfromtxt when you know (line, column) or something similar as was > kind of suggested in a post in this thread I believe. Then again, > this might not be possible. I haven't tried. > > I added another patch to ticket 1212 http://projects.scipy.org/numpy/ticket/1212 I tried to rework my first patch because I had forgotten that the header of the file that I was using was missing a delimiter. (Something I need to investigate more.) Hopefully it helps towards a better solution. I added a try/except block around the 'converter.upgrade(item)' line which appears to provide the results for your file. While not the best solution. In addition, I modified the loop to enumerate the converter list so I could find which one in the list fails. The output for your example: Row Number: 3 Failed Converter 2 in list of converters [('D01N01', '10/1/2003 ', 1.0, 75.0, 400, 600.0) ('L24U05', '12/5/2003', 2.0, 1.0, 300, 150.5) ('D02N03', '10/10/2004 ', 0.0, 0.0, 7, 145.55000000000001)] >> This this case I think it is the delimiter so checking the column >> numbers should occur before the application of the converter to that row. >> >> > Sometimes it was the case where I had an extra comma in a number 1,000 > say and then the converter tried to work on the wrong column, and > sometimes it was because my converter didn't cover every use case, > because I didn't know it yet. Either way, I just needed a gentle > nudge in the right direction. > > If that doesn't clear up what I was after, I can try to provide a more > detailed code sample. > > Skipper > _______________________________________________ > I do not see how to write code to determine when a delimiter has more than one meaning. While there are more columns than expected, it can be very hard to determine which column is incorrect without additional information. We might be able to that we we associate a format to a column. But then you would have to split columns one by one and checking each one as you do so. Probably not hard to do but a lot of work to validate it. For example, I have numerous problems with dates in SAS because you have 2 or 4 digit years, 1 or 2 digits days and months. But any variation than expected leads to errors if it expects 2 digit years and gets a 4 digit year. So I usually read dates as strings and then parse it as I want. Bruce From josef.pktd at gmail.com Fri Oct 2 13:08:46 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 2 Oct 2009 13:08:46 -0400 Subject: [Numpy-discussion] poly class question Message-ID: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> Is there a way in numpy (or scipy) to get an infinite expansion for the inverse of a polynomial (for a finite number of terms) np.poly1d([ -0.8, 1])**(-1) application for example the MA representation of an AR(1) and fractional powers np.poly1d([ -1, 1])**0.5 this is useful for fractionally integrated time series, e.g. ARFIMA Until now I did this directly or using scipy.signal, but I thought maybe the polynomial class would handle some of it, both examples raise exceptions. Josef From charlesr.harris at gmail.com Fri Oct 2 13:30:10 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 2 Oct 2009 11:30:10 -0600 Subject: [Numpy-discussion] poly class question In-Reply-To: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> Message-ID: On Fri, Oct 2, 2009 at 11:08 AM, wrote: > Is there a way in numpy (or scipy) to get an infinite expansion for > the inverse of a polynomial (for a finite number of terms) > > np.poly1d([ -0.8, 1])**(-1) > > application for example the MA representation of an AR(1) > > Hmm, I've been working on a chebyshev class and division of a scalar by a chebyshev series is expressly forbidden, but it could be included if a good interface is proposed. Same would go for polynomials. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Oct 2 13:33:00 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 2 Oct 2009 11:33:00 -0600 Subject: [Numpy-discussion] poly class question In-Reply-To: References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> Message-ID: On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris wrote: > > > On Fri, Oct 2, 2009 at 11:08 AM, wrote: > >> Is there a way in numpy (or scipy) to get an infinite expansion for >> the inverse of a polynomial (for a finite number of terms) >> >> np.poly1d([ -0.8, 1])**(-1) >> >> application for example the MA representation of an AR(1) >> >> > Hmm, I've been working on a chebyshev class and division of a scalar by a > chebyshev series is > expressly forbidden, but it could be included if a good interface is > proposed. Same would go for polynomials. > In fact is isn't hard to get, for poly1d you should be able to multiply the series by a power of x to shift it left, then divide. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Oct 2 13:35:48 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 2 Oct 2009 11:35:48 -0600 Subject: [Numpy-discussion] poly class question In-Reply-To: References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> Message-ID: On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris wrote: > > > On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Oct 2, 2009 at 11:08 AM, wrote: >> >>> Is there a way in numpy (or scipy) to get an infinite expansion for >>> the inverse of a polynomial (for a finite number of terms) >>> >>> np.poly1d([ -0.8, 1])**(-1) >>> >>> application for example the MA representation of an AR(1) >>> >>> >> Hmm, I've been working on a chebyshev class and division of a scalar by a >> chebyshev series is >> expressly forbidden, but it could be included if a good interface is >> proposed. Same would go for polynomials. >> > > In fact is isn't hard to get, for poly1d you should be able to multiply the > series by a power of x to shift it left, then divide. > > That is, divide a power of x by the polynomial. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Oct 2 14:09:32 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 2 Oct 2009 12:09:32 -0600 Subject: [Numpy-discussion] poly class question In-Reply-To: References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> Message-ID: On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris wrote: > > > On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Fri, Oct 2, 2009 at 11:08 AM, wrote: >>> >>>> Is there a way in numpy (or scipy) to get an infinite expansion for >>>> the inverse of a polynomial (for a finite number of terms) >>>> >>>> np.poly1d([ -0.8, 1])**(-1) >>>> >>>> application for example the MA representation of an AR(1) >>>> >>>> >>> Hmm, I've been working on a chebyshev class and division of a scalar by a >>> chebyshev series is >>> expressly forbidden, but it could be included if a good interface is >>> proposed. Same would go for polynomials. >>> >> >> In fact is isn't hard to get, for poly1d you should be able to multiply >> the series by a power of x to shift it left, then divide. >> >> > That is, divide a power of x by the polynomial. > > You will also need to reverse the denominator coefficients...Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Oct 2 14:30:57 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 2 Oct 2009 14:30:57 -0400 Subject: [Numpy-discussion] poly class question In-Reply-To: References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> Message-ID: <1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris wrote: > > > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris > wrote: >> >> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris >> wrote: >>> >>> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris >>> wrote: >>>> >>>> >>>> On Fri, Oct 2, 2009 at 11:08 AM, wrote: >>>>> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for >>>>> the inverse of a polynomial (for a finite number of terms) >>>>> >>>>> np.poly1d([ -0.8, 1])**(-1) >>>>> >>>>> application for example the MA representation of an AR(1) >>>>> >>>> >>>> Hmm, I've been working on a chebyshev class and division of a scalar by >>>> a chebyshev series is >>>> expressly forbidden, but it could be included if a good interface is >>>> proposed. Same would go for polynomials. >>> >>> In fact is isn't hard to get, for poly1d you should be able to multiply >>> the series by a power of x to shift it left, then divide. >>> >> >> That is, divide a power of x by the polynomial. >> > > You will also need to reverse the denominator coefficients...Chuck That's the hint I needed. However the polynomial coefficients are then reversed and not consistent with other polynomial operations, aren't they? >>> from scipy.signal import lfilter >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8]) (poly1d([ 1. , 0.8 , 0.64 , 0.512 , 0.4096 , 0.32768 , 0.262144 , 0.2097152 , 0.16777216, 0.13421773]), poly1d([ 0.10737418])) >>> lfilter([1], [1,-0.8], [1] + [0]*9) array([ 1. , 0.8 , 0.64 , 0.512 , 0.4096 , 0.32768 , 0.262144 , 0.2097152 , 0.16777216, 0.13421773]) >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2]) (poly1d([ 1. , 0.8 , 0.44 , 0.192 , 0.0656 , 0.01408 , -0.001856 , -0.0043008 , -0.00306944]), poly1d([-0.00159539, 0.00061389])) >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9) array([ 1. , 0.8 , 0.44 , 0.192 , 0.0656 , 0.01408 , -0.001856 , -0.0043008 , -0.00306944, -0.00159539]) What I meant initally doesn't necessarily mean division of a scalar. >>> np.poly1d([1])/np.poly1d([-0.8, 1]) (poly1d([ 0.]), poly1d([ 1.])) I didn't find any polynomial division that does the expansion of the remainder. The same problem, I think is inherited, by the scipy.signal.lti, and it took me a while to find the usefulness of lfilter in this case. If it were possible to extend the methods for the polynomial class to do a longer expansions, it would make them more useful for arma and lti. (in some areas, I'm still trying to figure out whether some functionality is just hidden to me, or actually a limitation of the implementation or a missing feature.) Thanks, Josef > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From mjanikas at esri.com Fri Oct 2 15:16:44 2009 From: mjanikas at esri.com (Mark Janikas) Date: Fri, 2 Oct 2009 12:16:44 -0700 Subject: [Numpy-discussion] Database with Nulls to Numpy Structure Message-ID: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com> Hello All, I was hoping you could help me out with a simple little problem I am having: I am reading data from a database that contains NULL values. There is more than one field being read in with equal length, but if any of them are NULL in a row, then I do NOT want to include it in my numpy structure (I.e. no records for that row across fields). As the values from each field are of the same type, I can pre-allocate the space for the entire dataset (if all were not NULL), but there may be less observations after accounting for the NULLS. So, do I use lists and append then create the arrays... Or do I fill up the pre-allocated "empty" arrays and slice off the ends? Thoughts? Thanks much... MJ Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjanikas at esri.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Fri Oct 2 15:33:35 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 02 Oct 2009 12:33:35 -0700 Subject: [Numpy-discussion] Database with Nulls to Numpy Structure In-Reply-To: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com> References: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com> Message-ID: <4AC6558F.8080400@noaa.gov> Mark Janikas wrote: > So, do I use lists and > append then create the arrays? Or do I fill up the pre-allocated ?empty? > arrays and slice off the ends? Thoughts? Thanks much? Either will work. I think the decision would be based on how many Null records you expect -- if it's a small fraction then go ahead and pre-allocate the array, if it's a large fraction, then you might want to go with a list. Note: you may be able to use arr.resize() to chop it off at the end. The list method has the downside of using more memory, and being a bit slower, which may be mitigated if there are lots of null records. See an upcoming email of mine for another option... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Fri Oct 2 15:38:53 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 2 Oct 2009 13:38:53 -0600 Subject: [Numpy-discussion] poly class question In-Reply-To: <1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com> References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> <1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com> Message-ID: On Fri, Oct 2, 2009 at 12:30 PM, wrote: > On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris > wrote: > > > > > > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris > > wrote: > >> > >> > >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris > >> wrote: > >>> > >>> > >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris > >>> wrote: > >>>> > >>>> > >>>> On Fri, Oct 2, 2009 at 11:08 AM, wrote: > >>>>> > >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for > >>>>> the inverse of a polynomial (for a finite number of terms) > >>>>> > >>>>> np.poly1d([ -0.8, 1])**(-1) > >>>>> > >>>>> application for example the MA representation of an AR(1) > >>>>> > >>>> > >>>> Hmm, I've been working on a chebyshev class and division of a scalar > by > >>>> a chebyshev series is > >>>> expressly forbidden, but it could be included if a good interface is > >>>> proposed. Same would go for polynomials. > >>> > >>> In fact is isn't hard to get, for poly1d you should be able to multiply > >>> the series by a power of x to shift it left, then divide. > >>> > >> > >> That is, divide a power of x by the polynomial. > >> > > > > You will also need to reverse the denominator coefficients...Chuck > > That's the hint I needed. However the polynomial coefficients are then > reversed and not consistent with other polynomial operations, aren't > they? > > >>> from scipy.signal import lfilter > > >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8]) > (poly1d([ 1. , 0.8 , 0.64 , 0.512 , 0.4096 , > 0.32768 , 0.262144 , 0.2097152 , 0.16777216, > 0.13421773]), poly1d([ 0.10737418])) > > >>> lfilter([1], [1,-0.8], [1] + [0]*9) > array([ 1. , 0.8 , 0.64 , 0.512 , 0.4096 , > 0.32768 , 0.262144 , 0.2097152 , 0.16777216, 0.13421773]) > > >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2]) > (poly1d([ 1. , 0.8 , 0.44 , 0.192 , 0.0656 , > 0.01408 , -0.001856 , -0.0043008 , -0.00306944]), > poly1d([-0.00159539, 0.00061389])) > >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9) > array([ 1. , 0.8 , 0.44 , 0.192 , 0.0656 , > 0.01408 , -0.001856 , -0.0043008 , -0.00306944, -0.00159539]) > > > What I meant initally doesn't necessarily mean division of a scalar. > > >>> np.poly1d([1])/np.poly1d([-0.8, 1]) > (poly1d([ 0.]), poly1d([ 1.])) > > I didn't find any polynomial division that does the expansion of the > remainder. The same problem, I think is inherited, by the > scipy.signal.lti, and it took me a while to find the usefulness of > lfilter in this case. > > If it were possible to extend the methods for the polynomial class to > do a longer expansions, it would make them more useful for arma and > lti. > > (in some areas, I'm still trying to figure out whether some > functionality is just hidden to me, or actually a limitation of the > implementation or a missing feature.) > > Could you describe the sort of problems you want to solve? There are lots of curious things out there we could maybe work with. Covariances, for instance, are closely related to Chebyshev series. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Oct 2 15:56:02 2009 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 02 Oct 2009 22:56:02 +0300 Subject: [Numpy-discussion] merging docs from wiki In-Reply-To: References: Message-ID: <1254513362.5712.10.camel@idol> to, 2009-10-01 kello 12:19 -0400, Ralf Gommers kirjoitti: > Sorry to ask again, but it would really be very useful to get those > docstrings merged for both scipy and numpy. [clip] Numpy's new docstrings is are now in SVN too, for the most part. An amazing amount of work was done during the summer, thanks to all who participated! > > For numpy in principle the same procedure, except there are some > objects that need the add_newdocs treatment. There are two types of > errors, my question is (mainly to Pauli) if they both need the same > treatment or a different one. > > Errors: > 1. source location not known, like: > ERROR: numpy.broadcast.next: source location for docstring is not known > 2. source location known but failed to find a place to add docstrings, > like: > ERROR: Source location for numpy.lib.function_base.iterable known, > but failed to find a place for the docstring These I didn't commit yet. Mostly, they can be fixed by adding necessary entries to add_newdocs.py. However, some of these may be objects assigning docstrings to which may be technically difficult and requires larger changes. The second error may also indicate a bug in patch generation. -- Pauli Virtanen From d.l.goldsmith at gmail.com Fri Oct 2 16:21:02 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 2 Oct 2009 13:21:02 -0700 Subject: [Numpy-discussion] merging docs from wiki In-Reply-To: <1254513362.5712.10.camel@idol> References: <1254513362.5712.10.camel@idol> Message-ID: <45d1ab480910021321m4bd3346at25bc702f89d38c35@mail.gmail.com> Is there any way to move the existing parts of this thread (i.e., not just future posts, which of course is as simple as posting them there instead) over to scipy-dev, where it really belongs? DG On Fri, Oct 2, 2009 at 12:56 PM, Pauli Virtanen wrote: > to, 2009-10-01 kello 12:19 -0400, Ralf Gommers kirjoitti: > > Sorry to ask again, but it would really be very useful to get those > > docstrings merged for both scipy and numpy. > [clip] > > Numpy's new docstrings is are now in SVN too, for the most part. An > amazing amount of work was done during the summer, thanks to all who > participated! > > > > For numpy in principle the same procedure, except there are some > > objects that need the add_newdocs treatment. There are two types of > > errors, my question is (mainly to Pauli) if they both need the same > > treatment or a different one. > > > > Errors: > > 1. source location not known, like: > > ERROR: numpy.broadcast.next: source location for docstring is not known > > 2. source location known but failed to find a place to add docstrings, > > like: > > ERROR: Source location for numpy.lib.function_base.iterable known, > > but failed to find a place for the docstring > > These I didn't commit yet. Mostly, they can be fixed by adding necessary > entries to add_newdocs.py. However, some of these may be objects > assigning docstrings to which may be technically difficult and requires > larger changes. The second error may also indicate a bug in patch > generation. > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Oct 2 16:40:03 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 2 Oct 2009 16:40:03 -0400 Subject: [Numpy-discussion] poly class question In-Reply-To: References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> <1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com> Message-ID: <1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com> On Fri, Oct 2, 2009 at 3:38 PM, Charles R Harris wrote: > > > On Fri, Oct 2, 2009 at 12:30 PM, wrote: >> >> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris >> wrote: >> > >> > >> > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris >> > wrote: >> >> >> >> >> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris >> >> wrote: >> >>> >> >>> >> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris >> >>> wrote: >> >>>> >> >>>> >> >>>> On Fri, Oct 2, 2009 at 11:08 AM, wrote: >> >>>>> >> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for >> >>>>> the inverse of a polynomial (for a finite number of terms) >> >>>>> >> >>>>> np.poly1d([ -0.8, 1])**(-1) >> >>>>> >> >>>>> application for example the MA representation of an AR(1) >> >>>>> >> >>>> >> >>>> Hmm, I've been working on a chebyshev class and division of a scalar >> >>>> by >> >>>> a chebyshev series is >> >>>> expressly forbidden, but it could be included if a good interface is >> >>>> proposed. Same would go for polynomials. >> >>> >> >>> In fact is isn't hard to get, for poly1d you should be able to >> >>> multiply >> >>> the series by a power of x to shift it left, then divide. >> >>> >> >> >> >> That is, divide a power of x by the polynomial. >> >> >> > >> > You will also need to reverse the denominator coefficients...Chuck >> >> That's the hint I needed. However the polynomial coefficients are then >> reversed and not consistent with other polynomial operations, aren't >> they? >> >> >>> from scipy.signal import lfilter >> >> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8]) >> (poly1d([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.64 ? ? ?, ?0.512 ? ? , ?0.4096 ? ?, >> ? ? ? ?0.32768 ? , ?0.262144 ?, ?0.2097152 , ?0.16777216, >> 0.13421773]), poly1d([ 0.10737418])) >> >> >>> lfilter([1], [1,-0.8], [1] + [0]*9) >> array([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.64 ? ? ?, ?0.512 ? ? , ?0.4096 ? ?, >> ? ? ? ?0.32768 ? , ?0.262144 ?, ?0.2097152 , ?0.16777216, ?0.13421773]) >> >> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2]) >> (poly1d([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.44 ? ? ?, ?0.192 ? ? , ?0.0656 ? ?, >> ? ? ? ?0.01408 ? , -0.001856 ?, -0.0043008 , -0.00306944]), >> poly1d([-0.00159539, ?0.00061389])) >> >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9) >> array([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.44 ? ? ?, ?0.192 ? ? , ?0.0656 ? ?, >> ? ? ? ?0.01408 ? , -0.001856 ?, -0.0043008 , -0.00306944, -0.00159539]) >> >> >> What I meant initally doesn't necessarily mean division of a scalar. >> >> >>> np.poly1d([1])/np.poly1d([-0.8, 1]) >> (poly1d([ 0.]), poly1d([ 1.])) >> >> I didn't find any polynomial division that does the expansion of the >> remainder. The same problem, I think is inherited, by the >> scipy.signal.lti, and it took me a while to find the usefulness of >> lfilter in this case. >> >> If it were possible to extend the methods for the polynomial class to >> do a longer expansions, it would make them more useful for arma and >> lti. >> >> (in some areas, I'm still trying to figure out whether some >> functionality is just hidden to me, or actually a limitation of the >> implementation or a missing feature.) >> > > Could you describe the sort of problems you want to solve? There are lots of > curious things out there we could maybe work with. Covariances, for > instance, are closely related to Chebyshev series. I am working on a discrete time arma process of the form a(L) x_t = b(L) u_t, where L is the lag operator L^k x_t = x_(t-k) what I just programmed using lfilter is x_t = b(L)/a(L) u_t where b(L)/a(L) is the impulse response function or moving average representation a(L)/b(L) is the autoregressive representation the extension a(L)(1-L)^d x_t = b(L) u_t, where d = 0,1,2,... (standard) or also continuous d <1 (fractional integration) a(L)/b(L), b(L)/a(L) (1-L)^(-d) or (1-L)^d (0>> from scipy import signal >>> signal.impulse(([1, -0.8],[1]), N=10) raise ValueError, "Improper transfer function." ValueError: Improper transfer function. while this works >>> signal.impulse(([1],[1, -0.8]), N=10) (It's been a while since I looked inside scipy.signal.lti) A separate issue would be the multivariate version VARMA, or MIMO in system modeling. a(L), b(L) are matrix polynomials and x_t, u_t are 1d arrays evolving in time. But that is a different discussion. I'm not very familiar with Chebychev polynomials, the last time I wanted to use them I didn't see anything about their use as a base for functions in several variables and gave up. I've seen papers that use them as base for functions in one variable, but I'm not doing anything like this right now. Thanks, Josef > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From mjanikas at esri.com Fri Oct 2 19:31:33 2009 From: mjanikas at esri.com (Mark Janikas) Date: Fri, 2 Oct 2009 16:31:33 -0700 Subject: [Numpy-discussion] Database with Nulls to Numpy Structure In-Reply-To: <4AC6558F.8080400@noaa.gov> References: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com> <4AC6558F.8080400@noaa.gov> Message-ID: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C3@redmx1.esri.com> Thanks for the input! I wonder if I can resize my own record array? I.e. one call to truncate... Ill give it a go. But the resize works great as it doesn't make a copy: In [12]: a = NUM.arange(10) In [13]: id(a) Out[13]: 190182896 In [14]: a.resize(5,) In [15]: a Out[15]: array([0, 1, 2, 3, 4]) In [16]: id(a) Out[16]: 190182896 Whereas the slice seems to make a copy/reassign: In [18]: a = a[0:2] In [19]: id(a) Out[19]: 189981184 Pretty Nice. Pre-allocate the full space and count number of good records... then resize. Doesn't seem that much faster than using the lists then creating arrays, but memory should be better. Thanks again, and anything further would be appreciated. MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Christopher Barker Sent: Friday, October 02, 2009 12:34 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Database with Nulls to Numpy Structure Mark Janikas wrote: > So, do I use lists and > append then create the arrays... Or do I fill up the pre-allocated "empty" > arrays and slice off the ends? Thoughts? Thanks much... Either will work. I think the decision would be based on how many Null records you expect -- if it's a small fraction then go ahead and pre-allocate the array, if it's a large fraction, then you might want to go with a list. Note: you may be able to use arr.resize() to chop it off at the end. The list method has the downside of using more memory, and being a bit slower, which may be mitigated if there are lots of null records. See an upcoming email of mine for another option... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From Chris.Barker at noaa.gov Fri Oct 2 23:38:55 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 02 Oct 2009 20:38:55 -0700 Subject: [Numpy-discussion] Database with Nulls to Numpy Structure In-Reply-To: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C3@redmx1.esri.com> References: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com> <4AC6558F.8080400@noaa.gov> <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C3@redmx1.esri.com> Message-ID: <4AC6C74F.1000203@noaa.gov> Mark Janikas wrote: > Thanks for the input! I wonder if I can resize my own record array? I.e. one call to truncate... Ill give it a go. you should be able too, yes. Be careful though, you can't call resize() if there are any other references to the array. > But the resize works great as it doesn't make a copy: Actually, it's not that simple. With numpy arrays, there is the array object itself, and there is the data block that the array points to. Whn you call resize() it may make a copy of the data block (which is why it won't work if there are other references to it), while keeping the same python object. > In [12]: a = NUM.arange(10) > > In [13]: id(a) > Out[13]: 190182896 > > In [14]: a.resize(5,) > > In [15]: a > Out[15]: array([0, 1, 2, 3, 4]) > > In [16]: id(a) > Out[16]: 190182896 So this shows you have the same python object. I think there is a way to get the value of the pointer to the data block, but I dont' know off the top of my head how. > Whereas the slice seems to make a copy/reassign: > > In [18]: a = a[0:2] > > In [19]: id(a) > Out[19]: 189981184 slicing creates a new python object, but it doesn't copy the actual data: In [4]: b = a[2:5] In [5]: a is b Out[5]: False In [6]: a[2:5] = 10 In [7]: a Out[7]: array([ 0, 1, 10, 10, 10, 5, 6, 7, 8, 9]) In [8]: b Out[8]: array([10, 10, 10]) so you can see a and b are different python objects, but they share the same data block. HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Sat Oct 3 03:26:49 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sat, 03 Oct 2009 00:26:49 -0700 Subject: [Numpy-discussion] A numpy accumulator... Message-ID: <4AC6FCB9.5040107@noaa.gov> Hasi all, This idea was inspired by a discussion at SciPY, in which we spent a LOT of time during the numpy tutorial talking about how to accumulate values in an array when you don't know how big the array needs to be when you start. The "standard practice" is to accumulate in a python list, then convert the final result into an array. This is a good idea because Python lists are standard, well tested, efficient, etc. However, as was pointed out in that lengthy discussion, if what you are doing is accumulating is a whole bunch of numbers (ints, floats, whatever), or particularly if you need to accumulate a data type that plain python doesn't support, there is a lot of overhead involved: a python float type is pretty heavyweight. If performance or memory use is important, it might create issues. You can use and array.array, but it doesn't support all numpy types, particularly custom dtypes. I talked about this on the cython list (as someone asked how to do accumulate in cython), and a few folks thought it would be useful, so I put together a prototype. What I have in mind is very simple. It would be: - Only 1-d - Support append() and extend() methods - support indexing and slicing - Support any valid numpy dtype - which could even get you pseudo n-d arrays... - maybe it would act like an array in other ways, I'm not so sure. - ufuncs, etc. It could take the place of using python lists/arrays when you really want a numpy array, but don't know how big it will be until you've filled it. The implementation I have now uses a regular numpy array as the "buffer". The buffer is re-sized as needed with ndarray.resize(). I've enclosed the class, a bunch of tests (This is the first time I've ever really done test-driven development, though I wouldn't say that this is a complete test suite). A few notes about this implementation: * the name of the class could be better, and so could some of the method names. * on further thought, I think it could handle n-d arrays, as long as you only accumulated along the first index. * It could use a bunch more methods - deleting part of eh array - math - probably anything supported by array.array would be good. * Robert pointed me to the array.array implimentation to see how it expands the buffer as you append. It did tricks to get it to grow fast when the array is very small, then eventually to add about 1/16 of the used array size to the buffer. I imagine that this would gets used because you were likely to have a big array, so I didn't bother and start with a buffer at 128 elements, then add 1/4 each time you need to expand -- these are both tweakable attributes. * I did a little simple profiling, and discovered that it's slower than a python list by a factor of more than 2 (for accumulating python ints, anyway). With a bit of experimentation, I think that's because of a couple factors: - an extra function call -- the append() method needs to then do an assignemt to the buffer - Object conversion -- python lists store python objects, so the python int can jsut go right in there. with numpy, it needs to be converted to a C int first -- a bit if extra overhead. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From robert.kern at gmail.com Sat Oct 3 03:32:13 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 3 Oct 2009 02:32:13 -0500 Subject: [Numpy-discussion] A numpy accumulator... In-Reply-To: <4AC6FCB9.5040107@noaa.gov> References: <4AC6FCB9.5040107@noaa.gov> Message-ID: <3d375d730910030032w2f5639c4p34c6de292d063335@mail.gmail.com> On Sat, Oct 3, 2009 at 02:26, Christopher Barker wrote: > The implementation I have now uses a regular numpy array as the > "buffer". The buffer is re-sized as needed with ndarray.resize(). I've > enclosed the class, a bunch of tests (This is the first time I've ever > really done test-driven development, though I wouldn't say that this is > a complete test suite). Forgot the attachment? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Sat Oct 3 03:38:26 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sat, 03 Oct 2009 00:38:26 -0700 Subject: [Numpy-discussion] A numpy accumulator... Message-ID: <4AC6FF72.1030308@noaa.gov> (I clicked send too early the last time -- sorry about that!) Hi all, This idea was inspired by a discussion at the SciPy conference, in which we spent a LOT of time during the numpy tutorial talking about how to accumulate values in an array when you don't know how big the array needs to be when you start. The "standard practice" is to accumulate in a python list, then convert the final result into an array. This is a good idea because Python lists are standard, well tested, efficient, etc. However, as was pointed out in that lengthy discussion, if what you are doing is accumulating is a whole bunch of numbers (ints, floats, whatever), or particularly if you need to accumulate a data type that plain python doesn't support, there is a lot of overhead involved: a python float type is pretty heavyweight. If performance or memory use is important, it might create issues. You can use and array.array, but it doesn't support all numpy types, particularly custom dtypes. I talked about this on the cython list (as someone asked how to do accumulate in cython), and a few folks thought it would be useful, so I put together a prototype. What I have in mind is very simple. It would be: - Only 1-d - Support append() and extend() methods - support indexing and slicing - Support any valid numpy dtype - which could even get you pseudo n-d arrays... - maybe it would act like an array in other ways, I'm not so sure. - ufuncs, etc. It could take the place of using python lists/arrays when you really want a numpy array, but don't know how big it will be until you've filled it. The implementation I have now uses a regular numpy array as the "buffer". The buffer is re-sized as needed with ndarray.resize(). I've enclosed the class, a bunch of tests (This is the first time I've ever really done test-driven development, though I wouldn't say that this is a complete test suite). A few notes about this implementation: * the name of the class could be better, and so could some of the method names. * on further thought, I think it could handle n-d arrays, as long as you only accumulated along the first index. * It could use a bunch more methods - deleting part of the array - math - probably anything supported by array.array would be good. * Robert pointed me to the array.array implimentation to see how it expands the buffer as you append. It did tricks to get it to grow fast when the array is very small, then eventually to add about 1/16 of the used array size to the buffer. I imagine that this would gets used because you were likely to have a big array, so I didn't bother and start with a buffer at 128 elements, then add 1/4 each time you need to expand -- these are both tweakable attributes. * I'm keeping the buffer a hidden variable, and slicing and __array__ return copies - this is so that it won't get multiple references, and then not be expandable. * I did a little simple profiling, and discovered that it's slower than a python list by a factor of more than 2 (for accumulating python ints, anyway). With a bit of experimentation, I think that's because of a couple factors: - an extra function call -- the append() method needs to then do an assignment to the buffer - Object conversion -- python lists store python objects, so the python int can just go right in there. with numpy, it needs to be converted to a C int first -- a bit if extra overhead. Though a straight assignment into a pre-allocated array i faster than a list. I think it's still an improvement for memory use. Maybe it would be worth writing in C or Cython to avoid some of this. In particular, it would be nice if you could use it in Cython, and put C types directly it... * This could be pretty useful for things like genfromtxt. What do folks think? is this useful? What would you change, etc? -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Sat Oct 3 04:06:12 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sat, 03 Oct 2009 01:06:12 -0700 Subject: [Numpy-discussion] A numpy accumulator... Message-ID: <4AC705F4.1040702@noaa.gov> OK -- this one I'm intending to send! Hi all, This idea was inspired by a discussion at the SciPy conference, in which we spent a LOT of time during the numpy tutorial talking about how to accumulate values in an array when you don't know how big the array needs to be when you start. The "standard practice" is to accumulate in a python list, then convert the final result into an array. This is a good idea because Python lists are standard, well tested, efficient, etc. However, as was pointed out in that lengthy discussion, if what you are doing is accumulating is a whole bunch of numbers (ints, floats, whatever), or particularly if you need to accumulate a data type that plain python doesn't support, there is a lot of overhead involved: a python float type is pretty heavyweight. If performance or memory use is important, it might create issues. You can use and array.array, but it doesn't support all numpy types, particularly custom dtypes. I talked about this on the cython list (as someone asked how to do accumulate in cython), and a few folks thought it would be useful, so I put together a prototype. What I have in mind is very simple. It would be: - Only 1-d - Support append() and extend() methods - support indexing and slicing - Support any valid numpy dtype - which could even get you pseudo n-d arrays... - maybe it would act like an array in other ways, I'm not so sure. - ufuncs, etc. It could take the place of using python lists/arrays when you really want a numpy array, but don't know how big it will be until you've filled it. The implementation I have now uses a regular numpy array as the "buffer". The buffer is re-sized as needed with ndarray.resize(). I've enclosed the class, a bunch of tests (This is the first time I've ever really done test-driven development, though I wouldn't say that this is a complete test suite). A few notes about this implementation: * the name of the class could be better, and so could some of the method names. * on further thought, I think it could handle n-d arrays, as long as you only accumulated along the first index. * It could use a bunch more methods - deleting part of the array - math - probably anything supported by array.array would be good. * Robert pointed me to the array.array implementation to see how it expands the buffer as you append. It did tricks to get it to grow fast when the array is very small, then eventually to add about 1/16 of the used array size to the buffer. I imagine that this would gets used because you were likely to have a big array, so I didn't bother and start with a buffer at 128 elements, then add 1/4 each time you need to expand -- these are both tweakable attributes. * I'm keeping the buffer a hidden variable, and slicing and __array__ return copies - this is so that it won't get multiple references, and then not be expandable. * I did a little simple profiling, and discovered that it's slower than a python list by a factor of more than 2 (for accumulating python ints, anyway). With a bit of experimentation, I think that's because of a couple factors: - an extra function call -- the append() method needs to then do an assignment to the buffer - Object conversion -- python lists store python objects, so the python int can just go right in there. with numpy, it needs to be converted to a C int first -- a bit if extra overhead. Though a straight assignment into a pre-allocated array i faster than a list. I think it's still an improvement for memory use. Maybe it would be worth writing in C or Cython to avoid some of this. In particular, it would be nice if you could use it in Cython, and put C types directly it... * This could be pretty useful for things like genfromtxt. What do folks think? is this useful? What would you change, etc? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: accumulator.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_accumulator.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: profile.py URL: From dagss at student.matnat.uio.no Sat Oct 3 04:24:44 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 03 Oct 2009 10:24:44 +0200 Subject: [Numpy-discussion] ufunc and errors In-Reply-To: <3d375d730909300833o7b2db961m45d9ad26575955cf@mail.gmail.com> References: <4AC36C79.1030202@student.matnat.uio.no> <3d375d730909300833o7b2db961m45d9ad26575955cf@mail.gmail.com> Message-ID: <4AC70A4C.9080405@student.matnat.uio.no> Robert Kern wrote: > On Wed, Sep 30, 2009 at 09:34, Dag Sverre Seljebotn > wrote: >> I looked and looked in the docs, but couldn't find an answer to this: >> When writing a ufunc, is it possible somehow to raise a Python exception >> (by acquiring the GIL first to raise it, set a flag and a callback which >> will be called with the GIL, or otherwise?). > > You cannot acquire the GIL inside the loop. In order to do so, you > would have to have access to the saved PyGILState_STATE which you > don't. I thought I could use PyGILState_Ensure (via Cython's "with gil" primitive): http://docs.python.org/c-api/init.html?PyGILState_Ensure (I've taken the rest of your email to heart, thanks.) > >> Or should one always use >> NaN even if the input does not make any sense (like, herhm, passing >> anything but integers or half-integers to a Wigner 3j symbol). > > You should use a NaN and ideally set the fpstatus to INVALID (creating > the NaN may or may not do this; you will have to experiment). This > will allow people to handle the issue as they wish using > numpy.seterr(). An exception for just one value out of thousands is > often undesirable. > >> I know how I'd to it manually in a wrapper w/ passed in context if not, >> but wanted to see. >> >> Also, will the arguments always be named x1, x2, x3, ..., or can I >> somehow give them custom names? > > The only place where names appear is in the docstring. Write whatever > text you like. > -- Dag Sverre From dagss at student.matnat.uio.no Sat Oct 3 12:06:32 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 03 Oct 2009 18:06:32 +0200 Subject: [Numpy-discussion] A numpy accumulator... In-Reply-To: <4AC705F4.1040702@noaa.gov> References: <4AC705F4.1040702@noaa.gov> Message-ID: <4AC77688.2040601@student.matnat.uio.no> Christopher Barker wrote: > OK -- this one I'm intending to send! > > Hi all, > > This idea was inspired by a discussion at the SciPy conference, in which > we spent a LOT of time during the numpy tutorial talking about how to > accumulate values in an array when you don't know how big the array > needs to be when you start. > > The "standard practice" is to accumulate in a python list, then convert > the final result into an array. This is a good idea because Python lists > are standard, well tested, efficient, etc. > > However, as was pointed out in that lengthy discussion, if what you are > doing is accumulating is a whole bunch of numbers (ints, floats, > whatever), or particularly if you need to accumulate a data type that > plain python doesn't support, there is a lot of overhead involved: a > python float type is pretty heavyweight. If performance or memory use is > important, it might create issues. You can use and array.array, but it > doesn't support all numpy types, particularly custom dtypes. > > I talked about this on the cython list (as someone asked how to do > accumulate in cython), and a few folks thought it would be useful, so I > put together a prototype. > > What I have in mind is very simple. It would be: > - Only 1-d > - Support append() and extend() methods > - support indexing and slicing > - Support any valid numpy dtype > - which could even get you pseudo n-d arrays... > - maybe it would act like an array in other ways, I'm not so sure. > - ufuncs, etc. > > It could take the place of using python lists/arrays when you really > want a numpy array, but don't know how big it will be until you've > filled it. > > The implementation I have now uses a regular numpy array as the > "buffer". The buffer is re-sized as needed with ndarray.resize(). I've > enclosed the class, a bunch of tests (This is the first time I've ever > really done test-driven development, though I wouldn't say that this is > a complete test suite). > > A few notes about this implementation: > > * the name of the class could be better, and so could some of the > method names. > > * on further thought, I think it could handle n-d arrays, as long as > you only accumulated along the first index. > > * It could use a bunch more methods > - deleting part of the array > - math > - probably anything supported by array.array would be good. > > * Robert pointed me to the array.array implementation to see how it > expands the buffer as you append. It did tricks to get it to grow fast > when the array is very small, then eventually to add about 1/16 of the > used array size to the buffer. I imagine that this would gets used > because you were likely to have a big array, so I didn't bother and > start with a buffer at 128 elements, then add 1/4 each time you need to > expand -- these are both tweakable attributes. > > * I'm keeping the buffer a hidden variable, and slicing and __array__ > return copies - this is so that it won't get multiple references, and > then not be expandable. > > * I did a little simple profiling, and discovered that it's slower > than a python list by a factor of more than 2 (for accumulating python > ints, anyway). With a bit of experimentation, I think that's because of > a couple factors: > - an extra function call -- the append() method needs to then do an > assignment to the buffer > - Object conversion -- python lists store python objects, so the > python int can just go right in there. with numpy, it needs to be > converted to a C int first -- a bit if extra overhead. Though a straight > assignment into a pre-allocated array i faster than a list. > > I think it's still an improvement for memory use. > > Maybe it would be worth writing in C or Cython to avoid some of this. In > particular, it would be nice if you could use it in Cython, and put C > types directly it... > > * This could be pretty useful for things like genfromtxt. > > What do folks think? is this useful? What would you change, etc? I'd drop the __getslice__ as it is deprecated (in Python 3 it is removed). Slices will be passed as "slice" objects to __getitem__ if you don't provide __getslice__. One could support myaccumulator[[1,2,3]] as well in __getitem__, although I guess it gets a little hairy as you must seek through the array-like object passed and see to it that no values are too large. -- Dag Sverre From gokhansever at gmail.com Sat Oct 3 12:04:36 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Sat, 3 Oct 2009 11:04:36 -0500 Subject: [Numpy-discussion] A numpy accumulator... In-Reply-To: <4AC6FCB9.5040107@noaa.gov> References: <4AC6FCB9.5040107@noaa.gov> Message-ID: <49d6b3500910030904v208b96e2w3b6b61fbc42718ab@mail.gmail.com> On Sat, Oct 3, 2009 at 2:26 AM, Christopher Barker wrote: > Hasi all, > > This idea was inspired by a discussion at SciPY, in which we spent a > LOT of time during the numpy tutorial talking about how to accumulate > values in an array when you don't know how big the array needs to be > when you start. > > The "standard practice" is to accumulate in a python list, then convert > the final result into an array. This is a good idea because Python lists > are standard, well tested, efficient, etc. > > However, as was pointed out in that lengthy discussion, if what you are > doing is accumulating is a whole bunch of numbers (ints, floats, > whatever), or particularly if you need to accumulate a data type that > plain python doesn't support, there is a lot of overhead involved: a > python float type is pretty heavyweight. If performance or memory use is > important, it might create issues. You can use and array.array, but it > doesn't support all numpy types, particularly custom dtypes. > > I talked about this on the cython list (as someone asked how to do > accumulate in cython), and a few folks thought it would be useful, so I > put together a prototype. > > What I have in mind is very simple. It would be: > - Only 1-d > - Support append() and extend() methods > Thanks for working on this. This append() method is a very handy for me, when working with lists. It is exiting to hear that it will be ported to ndarrays as well. Any plans for insert() ? > - support indexing and slicing > - Support any valid numpy dtype > - which could even get you pseudo n-d arrays... > - maybe it would act like an array in other ways, I'm not so sure. > - ufuncs, etc. > > It could take the place of using python lists/arrays when you really > want a numpy array, but don't know how big it will be until you've > filled it. > > The implementation I have now uses a regular numpy array as the > "buffer". The buffer is re-sized as needed with ndarray.resize(). I've > enclosed the class, a bunch of tests (This is the first time I've ever > really done test-driven development, though I wouldn't say that this is > a complete test suite). > > A few notes about this implementation: > > * the name of the class could be better, and so could some of the > method names. > > * on further thought, I think it could handle n-d arrays, as long as > you only accumulated along the first index. > > * It could use a bunch more methods > - deleting part of eh array > - math > - probably anything supported by array.array would be good. > > * Robert pointed me to the array.array implimentation to see how it > expands the buffer as you append. It did tricks to get it to grow fast > when the array is very small, then eventually to add about 1/16 of the > used array size to the buffer. I imagine that this would gets used > because you were likely to have a big array, so I didn't bother and > start with a buffer at 128 elements, then add 1/4 each time you need to > expand -- these are both tweakable attributes. > > * I did a little simple profiling, and discovered that it's slower > than a python list by a factor of more than 2 (for accumulating python > ints, anyway). With a bit of experimentation, I think that's because of > a couple factors: > - an extra function call -- the append() method needs to then do an > assignemt to the buffer > - Object conversion -- python lists store python objects, so the > python int can jsut go right in there. with numpy, it needs to be > converted to a C int first -- a bit if extra overhead. > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Sat Oct 3 18:08:38 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sat, 03 Oct 2009 15:08:38 -0700 Subject: [Numpy-discussion] A numpy accumulator... In-Reply-To: <49d6b3500910030904v208b96e2w3b6b61fbc42718ab@mail.gmail.com> References: <4AC6FCB9.5040107@noaa.gov> <49d6b3500910030904v208b96e2w3b6b61fbc42718ab@mail.gmail.com> Message-ID: <4AC7CB66.3020705@noaa.gov> G?khan Sever wrote: > Thanks for working on this. This append() method is a very handy for me, > when working with lists. It is exiting to hear that it will be ported to > ndarrays as well. not exactly ported -- this will be a special, limited-use class. > Any plans for insert() ? I wouldn't say I have any plans at all -- but yes, insert() would be good. Dag Sverre Seljebotn wrote: > I'd drop the __getslice__ as it is deprecated (in Python 3 it is > removed). Slices will be passed as "slice" objects to __getitem__ if you > don't provide __getslice__. I noticed that, but didn't know about the deprecation -- I'll refactor that. > One could support myaccumulator[[1,2,3]] as well in __getitem__, good idea. > although I guess it gets a little hairy as you must seek through the > array-like object passed and see to it that no values are too large. well, it wouldn't hard, though it might be slow...I'll give it a try and see how it works out. thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Sat Oct 3 21:12:29 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Oct 2009 19:12:29 -0600 Subject: [Numpy-discussion] rc1 of the chebyshev module. Message-ID: Attached is the first rc of the chebyshev module. The module documentation is not yet complete and no doubt the rest of the documentation needs to be reviewed. The tests cover basic functionality at this point but need to be extended to cover the Chebyshev object. Nevertheless, the module should be usable. Note that the most convenient way to do the least squared fits is with the static method Chebyshev.fit, which will return a Chebyshev object that contains both the resulting Chebyshev series and its domain. Some naming questions remain. ISTM that "lstsq" or "leastsq" might be a better name than fit. Likewise, I have kept the poly1d names "deriv" and "integ", but "der" and "int" might be more appropriate. Operators behave as expected for +, -, and * but there is no truedivision unless both operands can be interpreted as scalars. When division hasn't been imported from __future__, the / and // operators are both floordivision and % returns the remainder. Divmod behaves as expected. Any feedback is welcome. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: chebyshev.zip Type: application/zip Size: 10861 bytes Desc: not available URL: From charlesr.harris at gmail.com Sat Oct 3 22:18:53 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Oct 2009 20:18:53 -0600 Subject: [Numpy-discussion] rc1 of the chebyshev module. In-Reply-To: References: Message-ID: On Sat, Oct 3, 2009 at 7:12 PM, Charles R Harris wrote: > Attached is the first rc of the chebyshev module. The module documentation > is not yet complete and no doubt the rest of the documentation needs to be > reviewed. The tests cover basic functionality at this point but need to be > extended to cover the Chebyshev object. Nevertheless, the module should be > usable. > > Note that the most convenient way to do the least squared fits is with the > static method Chebyshev.fit, which will return a Chebyshev object that > contains both the resulting Chebyshev series and its domain. > > Some naming questions remain. ISTM that "lstsq" or "leastsq" might be a > better name than fit. Likewise, I have kept the poly1d names "deriv" and > "integ", but "der" and "int" might be more appropriate. > > Operators behave as expected for +, -, and * but there is no truedivision > unless both operands can be interpreted as scalars. When division hasn't > been imported from __future__, the / and // operators are both floordivision > and % returns the remainder. Divmod behaves as expected. > > Any feedback is welcome. > > And an updated test to reflect changes in treatment of leading zeros. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_chebyshev.py Type: text/x-python Size: 6571 bytes Desc: not available URL: From faltet at pytables.org Mon Oct 5 04:53:02 2009 From: faltet at pytables.org (Francesc Alted) Date: Mon, 5 Oct 2009 10:53:02 +0200 Subject: [Numpy-discussion] A numpy accumulator... In-Reply-To: <4AC705F4.1040702@noaa.gov> References: <4AC705F4.1040702@noaa.gov> Message-ID: <200910051053.05259.faltet@pytables.org> A Saturday 03 October 2009 10:06:12 Christopher Barker escrigu?: > OK -- this one I'm intending to send! > > Hi all, > > This idea was inspired by a discussion at the SciPy conference, in which > we spent a LOT of time during the numpy tutorial talking about how to > accumulate values in an array when you don't know how big the array > needs to be when you start. > > The "standard practice" is to accumulate in a python list, then convert > the final result into an array. This is a good idea because Python lists > are standard, well tested, efficient, etc. > > However, as was pointed out in that lengthy discussion, if what you are > doing is accumulating is a whole bunch of numbers (ints, floats, > whatever), or particularly if you need to accumulate a data type that > plain python doesn't support, there is a lot of overhead involved: a > python float type is pretty heavyweight. If performance or memory use is > important, it might create issues. You can use and array.array, but it > doesn't support all numpy types, particularly custom dtypes. > > I talked about this on the cython list (as someone asked how to do > accumulate in cython), and a few folks thought it would be useful, so I > put together a prototype. > > What I have in mind is very simple. It would be: > - Only 1-d > - Support append() and extend() methods > - support indexing and slicing > - Support any valid numpy dtype > - which could even get you pseudo n-d arrays... > - maybe it would act like an array in other ways, I'm not so sure. > - ufuncs, etc. > > It could take the place of using python lists/arrays when you really > want a numpy array, but don't know how big it will be until you've > filled it. > > The implementation I have now uses a regular numpy array as the > "buffer". The buffer is re-sized as needed with ndarray.resize(). I've > enclosed the class, a bunch of tests (This is the first time I've ever > really done test-driven development, though I wouldn't say that this is > a complete test suite). > > A few notes about this implementation: > > * the name of the class could be better, and so could some of the > method names. > > * on further thought, I think it could handle n-d arrays, as long as > you only accumulated along the first index. > > * It could use a bunch more methods > - deleting part of the array > - math > - probably anything supported by array.array would be good. > > * Robert pointed me to the array.array implementation to see how it > expands the buffer as you append. It did tricks to get it to grow fast > when the array is very small, then eventually to add about 1/16 of the > used array size to the buffer. I imagine that this would gets used > because you were likely to have a big array, so I didn't bother and > start with a buffer at 128 elements, then add 1/4 each time you need to > expand -- these are both tweakable attributes. > > * I'm keeping the buffer a hidden variable, and slicing and __array__ > return copies - this is so that it won't get multiple references, and > then not be expandable. > > * I did a little simple profiling, and discovered that it's slower > than a python list by a factor of more than 2 (for accumulating python > ints, anyway). With a bit of experimentation, I think that's because of > a couple factors: > - an extra function call -- the append() method needs to then do an > assignment to the buffer > - Object conversion -- python lists store python objects, so the > python int can just go right in there. with numpy, it needs to be > converted to a C int first -- a bit if extra overhead. Though a straight > assignment into a pre-allocated array i faster than a list. > > I think it's still an improvement for memory use. > > Maybe it would be worth writing in C or Cython to avoid some of this. In > particular, it would be nice if you could use it in Cython, and put C > types directly it... > > * This could be pretty useful for things like genfromtxt. > > What do folks think? is this useful? What would you change, etc? That's interesting. I'd normally use the `resize()` method for what you want, but indeed your approach is way more easy-to-use. If you are looking for performance improvements, I'd have a look at the `PyArray_Resize()` function in 'core/src/multiarray/shape.c' (trunk). It seems to me that the zero-initialization of added memory can be skipped, allowing for more performance for the `resize()` method (most specially for large size increments). A new parameter (say, ``zero_init=True``) could be added to `resize()` to specify that you don't want the memory initialized. -- Francesc Alted From sebastian.walter at gmail.com Mon Oct 5 05:37:39 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Mon, 5 Oct 2009 11:37:39 +0200 Subject: [Numpy-discussion] poly class question In-Reply-To: <1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com> References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> <1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com> <1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com> Message-ID: On Fri, Oct 2, 2009 at 10:40 PM, wrote: > On Fri, Oct 2, 2009 at 3:38 PM, Charles R Harris > wrote: >> >> >> On Fri, Oct 2, 2009 at 12:30 PM, wrote: >>> >>> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris >>> wrote: >>> > >>> > >>> > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris >>> > wrote: >>> >> >>> >> >>> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris >>> >> wrote: >>> >>> >>> >>> >>> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris >>> >>> wrote: >>> >>>> >>> >>>> >>> >>>> On Fri, Oct 2, 2009 at 11:08 AM, wrote: >>> >>>>> >>> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for >>> >>>>> the inverse of a polynomial (for a finite number of terms) >>> >>>>> >>> >>>>> np.poly1d([ -0.8, 1])**(-1) >>> >>>>> >>> >>>>> application for example the MA representation of an AR(1) >>> >>>>> >>> >>>> >>> >>>> Hmm, I've been working on a chebyshev class and division of a scalar >>> >>>> by >>> >>>> a chebyshev series is >>> >>>> expressly forbidden, but it could be included if a good interface is >>> >>>> proposed. Same would go for polynomials. >>> >>> >>> >>> In fact is isn't hard to get, for poly1d you should be able to >>> >>> multiply >>> >>> the series by a power of x to shift it left, then divide. >>> >>> >>> >> >>> >> That is, divide a power of x by the polynomial. >>> >> >>> > >>> > You will also need to reverse the denominator coefficients...Chuck >>> >>> That's the hint I needed. However the polynomial coefficients are then >>> reversed and not consistent with other polynomial operations, aren't >>> they? >>> >>> >>> from scipy.signal import lfilter >>> >>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8]) >>> (poly1d([ 1. , 0.8 , 0.64 , 0.512 , 0.4096 , >>> 0.32768 , 0.262144 , 0.2097152 , 0.16777216, >>> 0.13421773]), poly1d([ 0.10737418])) >>> >>> >>> lfilter([1], [1,-0.8], [1] + [0]*9) >>> array([ 1. , 0.8 , 0.64 , 0.512 , 0.4096 , >>> 0.32768 , 0.262144 , 0.2097152 , 0.16777216, 0.13421773]) >>> >>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2]) >>> (poly1d([ 1. , 0.8 , 0.44 , 0.192 , 0.0656 , >>> 0.01408 , -0.001856 , -0.0043008 , -0.00306944]), >>> poly1d([-0.00159539, 0.00061389])) >>> >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9) >>> array([ 1. , 0.8 , 0.44 , 0.192 , 0.0656 , >>> 0.01408 , -0.001856 , -0.0043008 , -0.00306944, -0.00159539]) >>> >>> >>> What I meant initally doesn't necessarily mean division of a scalar. >>> >>> >>> np.poly1d([1])/np.poly1d([-0.8, 1]) >>> (poly1d([ 0.]), poly1d([ 1.])) >>> >>> I didn't find any polynomial division that does the expansion of the >>> remainder. The same problem, I think is inherited, by the >>> scipy.signal.lti, and it took me a while to find the usefulness of >>> lfilter in this case. >>> >>> If it were possible to extend the methods for the polynomial class to >>> do a longer expansions, it would make them more useful for arma and >>> lti. >>> >>> (in some areas, I'm still trying to figure out whether some >>> functionality is just hidden to me, or actually a limitation of the >>> implementation or a missing feature.) >>> >> >> Could you describe the sort of problems you want to solve? There are lots of >> curious things out there we could maybe work with. Covariances, for >> instance, are closely related to Chebyshev series. > > I am working on a discrete time arma process of the form > > a(L) x_t = b(L) u_t, where L is the lag operator L^k x_t = x_(t-k) > > what I just programmed using lfilter is > x_t = b(L)/a(L) u_t where b(L)/a(L) is the impulse response function > or moving average representation > > a(L)/b(L) is the autoregressive representation > > the extension > a(L)(1-L)^d x_t = b(L) u_t, where d = 0,1,2,... (standard) or also > continuous d <1 (fractional integration) > > a(L)/b(L), b(L)/a(L) (1-L)^(-d) or (1-L)^d (0 dimensional lag polynomials in the general case. > Initially I was looking for an easy way to do these calculation as polynomials. > (The fractional case (1-L)^d (0 and I just looked it up today, but is a popular model class in > econometrics, fractionally integrated arma processes) > > multiplication works well np.poly1d([-1, 1])*np.poly1d([-0.8, 1]) > (with reversed poly coefficient scipy.signal I think) > > the functions in scipy.signal for lti are only for continuous time > processes and use poly1d under the hood, which means for example > >>>> from scipy import signal >>>> signal.impulse(([1, -0.8],[1]), N=10) > raise ValueError, "Improper transfer function." > ValueError: Improper transfer function. > > while this works >>>> signal.impulse(([1],[1, -0.8]), N=10) > > (It's been a while since I looked inside scipy.signal.lti) > > A separate issue would be the multivariate version VARMA, or MIMO in > system modeling. a(L), b(L) are matrix polynomials and x_t, u_t are > 1d arrays evolving in time. > But that is a different discussion. I'm working on something that requires truncated univariate/multivariate operations on scalar, vector and matrix polynomials. I'm pretty sure I'm doing something different than what is used in MIMO. Still, do you have a good reference for implementation/complexity/stability of operations on multivariate (matrix) polynomials? Want to make sure I'm not reinventing the wheel ;) > > I'm not very familiar with Chebychev polynomials, the last time I > wanted to use them I didn't see anything about their use as a base for > functions in several variables and gave up. I've seen papers that use > them as base for functions in one variable, but I'm not doing anything > like this right now. > > Thanks, > > Josef > >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From denis-bz-py at t-online.de Mon Oct 5 05:55:26 2009 From: denis-bz-py at t-online.de (denis bzowy) Date: Mon, 5 Oct 2009 09:55:26 +0000 (UTC) Subject: [Numpy-discussion] Convert data into rectangular grid References: Message-ID: jah gmail.com> writes: > Thanks all.? Robert, griddata is exactly what I was looking for.? David, I think that should work too.? And Denis, griddata is sufficiently fast that I am not complaining---contouring about 1e6 or 1e7 points typically. > Fyinfo, take a look at http://yt.enzotools.org "YT is an analysis and visualization system written in Python, designed for use with Adaptive Mesh Refinement codes ..." I haven't used it, but the doc and pictures are terrific, top 2 % or better From pearu.peterson at gmail.com Mon Oct 5 06:47:33 2009 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Mon, 05 Oct 2009 13:47:33 +0300 Subject: [Numpy-discussion] ANN: a journal paper about F2PY has been published Message-ID: <4AC9CEC5.6050705@cens.ioc.ee> -------- Original Message -------- Subject: [f2py] ANN: a journal paper about F2PY has been published Date: Mon, 05 Oct 2009 11:52:20 +0300 From: Pearu Peterson Reply-To: For users of the f2py program To: For users of the f2py program Hi, A journal paper about F2PY has been published in International Journal of Computational Science and Engineering: Peterson, P. (2009) 'F2PY: a tool for connecting Fortran and Python programs', Int. J. Computational Science and Engineering. Vol.4, No. 4, pp.296-305. So, if you would like to cite F2PY in a paper or presentation, using this reference is recommended. Interscience Publishers will update their web pages with the new journal number within few weeks. A softcopy of the article available in my homepage: http://cens.ioc.ee/~pearu/papers/IJCSE4.4_Paper_8.pdf Best regards, Pearu _______________________________________________ f2py-users mailing list f2py-users at cens.ioc.ee http://cens.ioc.ee/mailman/listinfo/f2py-users From denis-bz-py at t-online.de Mon Oct 5 08:53:03 2009 From: denis-bz-py at t-online.de (denis bzowy) Date: Mon, 5 Oct 2009 12:53:03 +0000 (UTC) Subject: [Numpy-discussion] numpy-discussion in google groups ? Message-ID: Folks, http://groups.google.com/group/numpy-discussion -> The group named numpy-discussion has been removed because it violated Google's Terms Of Service however scipy-user is there; how come ? I like google groups for its viewer, otherwise don't care much. What mail / group viewer do experts use ? cheers -- denis From paul at rudin.co.uk Mon Oct 5 09:00:39 2009 From: paul at rudin.co.uk (Paul Rudin) Date: Mon, 05 Oct 2009 14:00:39 +0100 Subject: [Numpy-discussion] numpy-discussion in google groups ? References: Message-ID: <87ljjq572g.fsf@rudin.co.uk> denis bzowy writes: > Folks, > http://groups.google.com/group/numpy-discussion > -> > The group named numpy-discussion has been removed because it violated Google's > Terms Of Service > > however scipy-user is there; how come ? > > I like google groups for its viewer, otherwise don't care much. > What mail / group viewer do experts use ? I don't think the "expert" bit applies, but I read via NNTP from gmane. From josef.pktd at gmail.com Mon Oct 5 09:44:42 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Oct 2009 09:44:42 -0400 Subject: [Numpy-discussion] numpy-discussion in google groups ? In-Reply-To: <87ljjq572g.fsf@rudin.co.uk> References: <87ljjq572g.fsf@rudin.co.uk> Message-ID: <1cd32cbb0910050644o26ab84d9w5f5d89c62b5f79f4@mail.gmail.com> On Mon, Oct 5, 2009 at 9:00 AM, Paul Rudin wrote: > > denis bzowy writes: > >> Folks, >> ? http://groups.google.com/group/numpy-discussion >> -> >> The group named numpy-discussion has been removed because it violated Google's >> Terms Of Service >> >> however scipy-user is there; how come ? I also used the google groups for numpy-discussion for a long time. It had disappeared for a while. And after it reappeared, it got hit by some ("adult-material") spam and was removed by Google. I don't know who the administrator for the google groups mirroring is. Now, I'm subscribed to the list and just read it in gmail. Josef >> >> I like google groups for its viewer, otherwise don't care much. >> What mail / group viewer do experts use ? > > I don't think the "expert" bit applies, but I read via NNTP from gmane. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon Oct 5 10:52:01 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Oct 2009 10:52:01 -0400 Subject: [Numpy-discussion] poly class question In-Reply-To: References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> <1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com> <1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com> Message-ID: <1cd32cbb0910050752m5f8e6bddia1d3c3d71d156cf0@mail.gmail.com> On Mon, Oct 5, 2009 at 5:37 AM, Sebastian Walter wrote: > On Fri, Oct 2, 2009 at 10:40 PM, ? wrote: >> On Fri, Oct 2, 2009 at 3:38 PM, Charles R Harris >> wrote: >>> >>> >>> On Fri, Oct 2, 2009 at 12:30 PM, wrote: >>>> >>>> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris >>>> wrote: >>>> > >>>> > >>>> > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris >>>> > wrote: >>>> >> >>>> >> >>>> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris >>>> >> wrote: >>>> >>> >>>> >>> >>>> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris >>>> >>> wrote: >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Oct 2, 2009 at 11:08 AM, wrote: >>>> >>>>> >>>> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for >>>> >>>>> the inverse of a polynomial (for a finite number of terms) >>>> >>>>> >>>> >>>>> np.poly1d([ -0.8, 1])**(-1) >>>> >>>>> >>>> >>>>> application for example the MA representation of an AR(1) >>>> >>>>> >>>> >>>> >>>> >>>> Hmm, I've been working on a chebyshev class and division of a scalar >>>> >>>> by >>>> >>>> a chebyshev series is >>>> >>>> expressly forbidden, but it could be included if a good interface is >>>> >>>> proposed. Same would go for polynomials. >>>> >>> >>>> >>> In fact is isn't hard to get, for poly1d you should be able to >>>> >>> multiply >>>> >>> the series by a power of x to shift it left, then divide. >>>> >>> >>>> >> >>>> >> That is, divide a power of x by the polynomial. >>>> >> >>>> > >>>> > You will also need to reverse the denominator coefficients...Chuck >>>> >>>> That's the hint I needed. However the polynomial coefficients are then >>>> reversed and not consistent with other polynomial operations, aren't >>>> they? >>>> >>>> >>> from scipy.signal import lfilter >>>> >>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8]) >>>> (poly1d([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.64 ? ? ?, ?0.512 ? ? , ?0.4096 ? ?, >>>> ? ? ? ?0.32768 ? , ?0.262144 ?, ?0.2097152 , ?0.16777216, >>>> 0.13421773]), poly1d([ 0.10737418])) >>>> >>>> >>> lfilter([1], [1,-0.8], [1] + [0]*9) >>>> array([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.64 ? ? ?, ?0.512 ? ? , ?0.4096 ? ?, >>>> ? ? ? ?0.32768 ? , ?0.262144 ?, ?0.2097152 , ?0.16777216, ?0.13421773]) >>>> >>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2]) >>>> (poly1d([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.44 ? ? ?, ?0.192 ? ? , ?0.0656 ? ?, >>>> ? ? ? ?0.01408 ? , -0.001856 ?, -0.0043008 , -0.00306944]), >>>> poly1d([-0.00159539, ?0.00061389])) >>>> >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9) >>>> array([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.44 ? ? ?, ?0.192 ? ? , ?0.0656 ? ?, >>>> ? ? ? ?0.01408 ? , -0.001856 ?, -0.0043008 , -0.00306944, -0.00159539]) >>>> >>>> >>>> What I meant initally doesn't necessarily mean division of a scalar. >>>> >>>> >>> np.poly1d([1])/np.poly1d([-0.8, 1]) >>>> (poly1d([ 0.]), poly1d([ 1.])) >>>> >>>> I didn't find any polynomial division that does the expansion of the >>>> remainder. The same problem, I think is inherited, by the >>>> scipy.signal.lti, and it took me a while to find the usefulness of >>>> lfilter in this case. >>>> >>>> If it were possible to extend the methods for the polynomial class to >>>> do a longer expansions, it would make them more useful for arma and >>>> lti. >>>> >>>> (in some areas, I'm still trying to figure out whether some >>>> functionality is just hidden to me, or actually a limitation of the >>>> implementation or a missing feature.) >>>> >>> >>> Could you describe the sort of problems you want to solve? There are lots of >>> curious things out there we could maybe work with. Covariances, for >>> instance, are closely related to Chebyshev series. >> >> I am working on a discrete time arma process of the form >> >> a(L) x_t = b(L) u_t, ?where L is the lag operator L^k x_t = x_(t-k) >> >> what I just programmed using lfilter is >> x_t = b(L)/a(L) u_t ?where ?b(L)/a(L) is the impulse response function >> or moving average representation >> >> ?a(L)/b(L) ?is the autoregressive representation >> >> the extension >> a(L)(1-L)^d x_t = b(L) u_t, ?where d = 0,1,2,... ?(standard) or ?also >> continuous d <1 (fractional integration) >> >> ?a(L)/b(L), ?b(L)/a(L) ?(1-L)^(-d) ?or (1-L)^d (0> dimensional lag polynomials in the general case. >> Initially I was looking for an easy way to do these calculation as polynomials. >> (The fractional case (1-L)^d (0> and I just looked it up today, but is a popular model class in >> econometrics, fractionally integrated arma processes) >> >> multiplication works well np.poly1d([-1, 1])*np.poly1d([-0.8, 1]) >> (with reversed poly coefficient scipy.signal I think) >> >> the functions in scipy.signal for lti are only for continuous time >> processes and use poly1d under the hood, which means for example >> >>>>> from scipy import signal >>>>> signal.impulse(([1, -0.8],[1]), N=10) >> ? ?raise ValueError, "Improper transfer function." >> ValueError: Improper transfer function. >> >> while this works >>>>> signal.impulse(([1],[1, -0.8]), N=10) >> >> (It's been a while since I looked inside scipy.signal.lti) >> >> A separate issue would be the multivariate version VARMA, or MIMO in >> system modeling. ?a(L), b(L) are matrix polynomials and x_t, u_t are >> 1d arrays evolving in time. >> But that is a different discussion. > > I'm working on something that requires truncated > univariate/multivariate operations > on scalar, vector and matrix polynomials. I'm pretty sure I'm doing > something different > than what is used in MIMO. ?Still, do you have a good reference for > implementation/complexity/stability of operations on multivariate > (matrix) polynomials? > Want to make sure I'm not reinventing the wheel ;) Sorry, but I'm no help here. I was hoping to benefit from the knowledge of others. I have used univariate and multivariate polynomials for special cases for function approximation and time series analysis, but I know little about the general numerical issues in this. For MIMO, I only looked at the matlab systems toolbox, which would be an extension of scipy.signal.lti. Since these operations are in my case usually inside an optimization loop in a (supposedly) well behaved problem, I usually cared more about speed and approximation errors in low order polynomials than numerical stability. If you invent the wheel, I would be very glad to use it. Josef > > > >> >> I'm not very familiar with Chebychev polynomials, the last time I >> wanted to use them I didn't see anything about their use as a base for >> functions in several variables and gave up. I've seen papers that use >> them as base for functions in one variable, but I'm not doing anything >> like this right now. >> >> Thanks, >> >> Josef >> >>> >>> Chuck >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian.walter at gmail.com Mon Oct 5 11:39:56 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Mon, 5 Oct 2009 17:39:56 +0200 Subject: [Numpy-discussion] poly class question In-Reply-To: <1cd32cbb0910050752m5f8e6bddia1d3c3d71d156cf0@mail.gmail.com> References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com> <1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com> <1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com> <1cd32cbb0910050752m5f8e6bddia1d3c3d71d156cf0@mail.gmail.com> Message-ID: On Mon, Oct 5, 2009 at 4:52 PM, wrote: > On Mon, Oct 5, 2009 at 5:37 AM, Sebastian Walter > wrote: >> On Fri, Oct 2, 2009 at 10:40 PM, wrote: >>> On Fri, Oct 2, 2009 at 3:38 PM, Charles R Harris >>> wrote: >>>> >>>> >>>> On Fri, Oct 2, 2009 at 12:30 PM, wrote: >>>>> >>>>> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris >>>>> wrote: >>>>> > >>>>> > >>>>> > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris >>>>> > wrote: >>>>> >> >>>>> >> >>>>> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris >>>>> >> wrote: >>>>> >>> >>>>> >>> >>>>> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris >>>>> >>> wrote: >>>>> >>>> >>>>> >>>> >>>>> >>>> On Fri, Oct 2, 2009 at 11:08 AM, wrote: >>>>> >>>>> >>>>> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for >>>>> >>>>> the inverse of a polynomial (for a finite number of terms) >>>>> >>>>> >>>>> >>>>> np.poly1d([ -0.8, 1])**(-1) >>>>> >>>>> >>>>> >>>>> application for example the MA representation of an AR(1) >>>>> >>>>> >>>>> >>>> >>>>> >>>> Hmm, I've been working on a chebyshev class and division of a scalar >>>>> >>>> by >>>>> >>>> a chebyshev series is >>>>> >>>> expressly forbidden, but it could be included if a good interface is >>>>> >>>> proposed. Same would go for polynomials. >>>>> >>> >>>>> >>> In fact is isn't hard to get, for poly1d you should be able to >>>>> >>> multiply >>>>> >>> the series by a power of x to shift it left, then divide. >>>>> >>> >>>>> >> >>>>> >> That is, divide a power of x by the polynomial. >>>>> >> >>>>> > >>>>> > You will also need to reverse the denominator coefficients...Chuck >>>>> >>>>> That's the hint I needed. However the polynomial coefficients are then >>>>> reversed and not consistent with other polynomial operations, aren't >>>>> they? >>>>> >>>>> >>> from scipy.signal import lfilter >>>>> >>>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8]) >>>>> (poly1d([ 1. , 0.8 , 0.64 , 0.512 , 0.4096 , >>>>> 0.32768 , 0.262144 , 0.2097152 , 0.16777216, >>>>> 0.13421773]), poly1d([ 0.10737418])) >>>>> >>>>> >>> lfilter([1], [1,-0.8], [1] + [0]*9) >>>>> array([ 1. , 0.8 , 0.64 , 0.512 , 0.4096 , >>>>> 0.32768 , 0.262144 , 0.2097152 , 0.16777216, 0.13421773]) >>>>> >>>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2]) >>>>> (poly1d([ 1. , 0.8 , 0.44 , 0.192 , 0.0656 , >>>>> 0.01408 , -0.001856 , -0.0043008 , -0.00306944]), >>>>> poly1d([-0.00159539, 0.00061389])) >>>>> >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9) >>>>> array([ 1. , 0.8 , 0.44 , 0.192 , 0.0656 , >>>>> 0.01408 , -0.001856 , -0.0043008 , -0.00306944, -0.00159539]) >>>>> >>>>> >>>>> What I meant initally doesn't necessarily mean division of a scalar. >>>>> >>>>> >>> np.poly1d([1])/np.poly1d([-0.8, 1]) >>>>> (poly1d([ 0.]), poly1d([ 1.])) >>>>> >>>>> I didn't find any polynomial division that does the expansion of the >>>>> remainder. The same problem, I think is inherited, by the >>>>> scipy.signal.lti, and it took me a while to find the usefulness of >>>>> lfilter in this case. >>>>> >>>>> If it were possible to extend the methods for the polynomial class to >>>>> do a longer expansions, it would make them more useful for arma and >>>>> lti. >>>>> >>>>> (in some areas, I'm still trying to figure out whether some >>>>> functionality is just hidden to me, or actually a limitation of the >>>>> implementation or a missing feature.) >>>>> >>>> >>>> Could you describe the sort of problems you want to solve? There are lots of >>>> curious things out there we could maybe work with. Covariances, for >>>> instance, are closely related to Chebyshev series. >>> >>> I am working on a discrete time arma process of the form >>> >>> a(L) x_t = b(L) u_t, where L is the lag operator L^k x_t = x_(t-k) >>> >>> what I just programmed using lfilter is >>> x_t = b(L)/a(L) u_t where b(L)/a(L) is the impulse response function >>> or moving average representation >>> >>> a(L)/b(L) is the autoregressive representation >>> >>> the extension >>> a(L)(1-L)^d x_t = b(L) u_t, where d = 0,1,2,... (standard) or also >>> continuous d <1 (fractional integration) >>> >>> a(L)/b(L), b(L)/a(L) (1-L)^(-d) or (1-L)^d (0>> dimensional lag polynomials in the general case. >>> Initially I was looking for an easy way to do these calculation as polynomials. >>> (The fractional case (1-L)^d (0>> and I just looked it up today, but is a popular model class in >>> econometrics, fractionally integrated arma processes) >>> >>> multiplication works well np.poly1d([-1, 1])*np.poly1d([-0.8, 1]) >>> (with reversed poly coefficient scipy.signal I think) >>> >>> the functions in scipy.signal for lti are only for continuous time >>> processes and use poly1d under the hood, which means for example >>> >>>>>> from scipy import signal >>>>>> signal.impulse(([1, -0.8],[1]), N=10) >>> raise ValueError, "Improper transfer function." >>> ValueError: Improper transfer function. >>> >>> while this works >>>>>> signal.impulse(([1],[1, -0.8]), N=10) >>> >>> (It's been a while since I looked inside scipy.signal.lti) >>> >>> A separate issue would be the multivariate version VARMA, or MIMO in >>> system modeling. a(L), b(L) are matrix polynomials and x_t, u_t are >>> 1d arrays evolving in time. >>> But that is a different discussion. >> > > >> I'm working on something that requires truncated >> univariate/multivariate operations >> on scalar, vector and matrix polynomials. I'm pretty sure I'm doing >> something different >> than what is used in MIMO. Still, do you have a good reference for >> implementation/complexity/stability of operations on multivariate >> (matrix) polynomials? >> Want to make sure I'm not reinventing the wheel ;) > > Sorry, but I'm no help here. I was hoping to benefit from the knowledge > of others. I have used univariate and multivariate polynomials for special > cases for function approximation and time series analysis, but I know > little about the general numerical issues in this. For MIMO, I only looked > at the matlab systems toolbox, which would be an extension of scipy.signal.lti. > > Since these operations are in my case usually inside an optimization loop > in a (supposedly) well behaved problem, I usually cared more about > speed and approximation errors in low order polynomials than numerical > stability. > > If you invent the wheel, I would be very glad to use it. Well, I'd be happy to have a user of my code :). However, I'm not sure what I'm doing really helps you: the package ALGOPY I'm working on computes on truncated Taylor polynomials x(t) = x_0 + x_1 t + x_2 t^2 + ... + x_{D-1}t^D I have implemented most common functions like z(t) = x(t)/y(t) z(t) = x(t) * y(t) z(t) = exp(x(t)) z(t) = sin(x(t)) z(t) has always the same degree as x(t) and y(t) (assumed to have the same degree unless it is constant, i.e. z(t) = 1./x(t) works). All this is implemented in a single class called UTPS in http://github.com/b45ch1/algopy/blob/master/algopy/utp/utps.py The actual reason for ALGOPY is the matrix polynomial class UTPM in http://github.com/b45ch1/algopy/blob/master/algopy/utp/utpm.py The rationale is that to compute derivatives of matrix valued functions one should compute on truncated matrix polynomials. One good example is to compute the Jacobian J = [dy/dA ,dy/dx] of y = solve(A,x) where y an (N,) array A (N,N) array x (N,) array The underlying method is to generalize the solve-function to work on polynomials, i.e. y(t) = solve( A(t), x(t)) where y(t) = y_0 + y_1 t + y_2 t^2 + ... A(t) = A_0 + A_1 t + A_2 t^2 + ... x(t) = x_0 + x_1 t + ... You can have a look at http://github.com/b45ch1/algopy/blob/master/algopy/utp/utpm.py#L257 I'm afraid the best tutorial I can give is a talk that I've given on a small workshop: http://github.com/b45ch1/algopy/raw/master/documentation/Seventh_EuroAd_Workshop-Sebastian_Walter-Higher_Order_Forward_and_Reverse_Mode_on_Matrices_with_Application_to_Optimal_Experimental_Design.pdf Sebastian > > Josef > >> >> >> >>> >>> I'm not very familiar with Chebychev polynomials, the last time I >>> wanted to use them I didn't see anything about their use as a base for >>> functions in several variables and gave up. I've seen papers that use >>> them as base for functions in one variable, but I'm not doing anything >>> like this right now. >>> >>> Thanks, >>> >>> Josef >>> >>>> >>>> Chuck >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From seb.haase at gmail.com Mon Oct 5 11:56:27 2009 From: seb.haase at gmail.com (Sebastian Haase) Date: Mon, 5 Oct 2009 17:56:27 +0200 Subject: [Numpy-discussion] numpy.asum ? Message-ID: Hi, Is this a dumb question ? Why is there no np.asum() equivalent to np.sum() - like amax() to max() ? Another question: what does it mean that amax() (and max()) is a "function" while maximum() is a ufunc !? >>> N.max >>> N.maximum >>> N.amax Is there a performance difference connected to this ? Cheers, Sebastian Haase From robert.kern at gmail.com Mon Oct 5 12:04:21 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Oct 2009 11:04:21 -0500 Subject: [Numpy-discussion] numpy.asum ? In-Reply-To: References: Message-ID: <3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com> On Mon, Oct 5, 2009 at 10:56, Sebastian Haase wrote: > Hi, > > Is this a dumb question ? > Why is there no np.asum() equivalent to np.sum() ?- like amax() to max() ? Back when Numeric was being written, max() and min() existed as builtins, but sum() did not. In order to support "from Numeric import *", the amax() aliases were added. sum() was added to the builtins later, but no one went back to add an asum() alias. > Another question: what does it mean that amax() (and max()) is a > "function" while maximum() is a ufunc !? > >>>> N.max > >>>> N.maximum > >>>> N.amax > > > Is there a performance difference connected to this ? No. maximum(x,y) is a binary ufunc that takes two arrays and returns an array with the element-wise maximum from between the two inputs. amax(x) is an unary function that returns the maximum value in the array. amax(x) is a convenience for maximum.reduce(x.flat). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Mon Oct 5 14:06:27 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 05 Oct 2009 11:06:27 -0700 Subject: [Numpy-discussion] A numpy accumulator... In-Reply-To: <200910051053.05259.faltet@pytables.org> References: <4AC705F4.1040702@noaa.gov> <200910051053.05259.faltet@pytables.org> Message-ID: <4ACA35A3.3030707@noaa.gov> Francesc Alted wrote: > A Saturday 03 October 2009 10:06:12 Christopher Barker escrigu?: >> This idea was inspired by a discussion at the SciPy conference, in which >> we spent a LOT of time during the numpy tutorial talking about how to >> accumulate values in an array when you don't know how big the array >> needs to be when you start. >> What I have in mind is very simple. It would be: >> - Only 1-d >> - Support append() and extend() methods >> - support indexing and slicing >> - Support any valid numpy dtype >> - which could even get you pseudo n-d arrays... >> - maybe it would act like an array in other ways, I'm not so sure. >> - ufuncs, etc. > That's interesting. I'd normally use the `resize()` method for what you want, > but indeed your approach is way more easy-to-use. Of course, this is using resize() under the hood, but giving it an easier interface, but more importantly, it's adding the pre-allocation for you, and the code to deal with that. I suppose I should benchmark it, but I think calling resize(0 with every append would be a lot slower (though maybe not -- might the compiler/os be pre-allocating some extra memory anyway?) I should profile this -- if you can call resize() with every new item, and it's not too slow, then it may not be worth writing this class at all (or I could make it simpler, maybe even an nd-array subclass instead. > If you are looking for performance improvements, I'd have a look at the > `PyArray_Resize()` function in 'core/src/multiarray/shape.c' (trunk). It > seems to me that the zero-initialization of added memory can be skipped, > allowing for more performance for the `resize()` method (most specially for > large size increments). I suppose so, but I doubt that's causing any of my performance issues. Another thing to profile. > A new parameter (say, ``zero_init=True``) could be > added to `resize()` to specify that you don't want the memory initialized. That does seem like a good idea, but maybe over my head to implement. Now I need some time to work on this some more... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From seb.haase at gmail.com Mon Oct 5 14:37:16 2009 From: seb.haase at gmail.com (Sebastian Haase) Date: Mon, 5 Oct 2009 20:37:16 +0200 Subject: [Numpy-discussion] numpy.asum ? In-Reply-To: <3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com> References: <3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com> Message-ID: Thanks for the reply. I thought one reason for amax was that from numpy import * would not not import max but only amax. How about sum ? Does "from numpy import *" overwrite the builtin sum ? not to mention the "symmetry" / consistency argument for having "asum" ? More comments ?? --Sebastian Haase On Mon, Oct 5, 2009 at 6:04 PM, Robert Kern wrote: > On Mon, Oct 5, 2009 at 10:56, Sebastian Haase wrote: >> Hi, >> >> Is this a dumb question ? >> Why is there no np.asum() equivalent to np.sum() ?- like amax() to max() ? > > Back when Numeric was being written, max() and min() existed as > builtins, but sum() did not. In order to support "from Numeric import > *", the amax() aliases were added. sum() was added to the builtins > later, but no one went back to add an asum() alias. > >> Another question: what does it mean that amax() (and max()) is a >> "function" while maximum() is a ufunc !? >> >>>>> N.max >> >>>>> N.maximum >> >>>>> N.amax >> >> >> Is there a performance difference connected to this ? > > No. maximum(x,y) is a binary ufunc that takes two arrays and returns > an array with the element-wise maximum from between the two inputs. > amax(x) is an unary function that returns the maximum value in the > array. amax(x) is a convenience for maximum.reduce(x.flat). > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Mon Oct 5 14:43:30 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Oct 2009 13:43:30 -0500 Subject: [Numpy-discussion] numpy.asum ? In-Reply-To: References: <3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com> Message-ID: <3d375d730910051143y9fa3dc1vdca2b7bb10c5ab65@mail.gmail.com> On Mon, Oct 5, 2009 at 13:37, Sebastian Haase wrote: > Thanks for the reply. > I thought one reason for amax was that > from numpy import * > would not not import max but only amax. I have my timelines confused. Numeric has neither amax() nor max(). I don't actually recall the sequence of events, then. > How about sum ? > Does "from numpy import *" > overwrite the builtin sum ? Try it. > not to mention the "symmetry" / consistency argument for having "asum" ? At this point, I don't care to cater to "from numpy import *" use case. Too much code uses numpy.sum() remove it, or even deprecate it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at gmail.com Mon Oct 5 15:13:40 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 5 Oct 2009 15:13:40 -0400 Subject: [Numpy-discussion] genfromtxt - the return Message-ID: All, Could you try r7449 ? I introduced some mechanisms to keep track of invalid lines (where the number of columns don't match what's expected). By default, a warning is emitted and these lines are skipped, but an optional argument gives the possibility to raise an exception instead. Now, I need more tests about wrong converters. I'm trying to optimize the upgrade mechanism (there are too many intertwined loops for my taste now), I'll keep you posted. Meanwhile, if you could come with more cases of failure, please send them my way. Cheers P. From charlesr.harris at gmail.com Mon Oct 5 15:54:53 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 5 Oct 2009 13:54:53 -0600 Subject: [Numpy-discussion] Easy way to test documentation? Message-ID: Hi All, Is there an easy way to test build documentation for a module that is not yet part of numpy? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From seb.haase at gmail.com Mon Oct 5 15:55:01 2009 From: seb.haase at gmail.com (Sebastian Haase) Date: Mon, 5 Oct 2009 21:55:01 +0200 Subject: [Numpy-discussion] numpy.asum ? In-Reply-To: <3d375d730910051143y9fa3dc1vdca2b7bb10c5ab65@mail.gmail.com> References: <3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com> <3d375d730910051143y9fa3dc1vdca2b7bb10c5ab65@mail.gmail.com> Message-ID: On Mon, Oct 5, 2009 at 8:43 PM, Robert Kern wrote: > On Mon, Oct 5, 2009 at 13:37, Sebastian Haase wrote: >> Thanks for the reply. >> I thought one reason for amax was that >> from numpy import * >> would not not import max but only amax. > > I have my timelines confused. Numeric has neither amax() nor max(). I > don't actually recall the sequence of events, then. > >> How about sum ? >> Does "from numpy import *" >> overwrite the builtin sum ? > > Try it. > >>> sum >>> from numpy import * >>> sum >>> asum Traceback (most recent call last): File "", line 1, in NameError: name 'asum' is not defined >>> N.__version__ '1.3.0' >>> >> not to mention the "symmetry" / consistency argument for having "asum" ? > > At this point, I don't care to cater to "from numpy import *" use > case. Too much code uses numpy.sum() remove it, or even deprecate it. > I did not mean to suggest to remove or deprecate it. I only remember that there was a discussion - long time ago - that "from numpy import *" (still common in many places, like interactive sessions) - should not overwrite builtins .... Personally, I would prefer to write np.amax and np.asum ... do you see my argument for consistency here ? - Sebastian From robert.kern at gmail.com Mon Oct 5 15:58:47 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Oct 2009 14:58:47 -0500 Subject: [Numpy-discussion] numpy.asum ? In-Reply-To: References: <3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com> <3d375d730910051143y9fa3dc1vdca2b7bb10c5ab65@mail.gmail.com> Message-ID: <3d375d730910051258w4ba92dtb9740e3753caaec3@mail.gmail.com> On Mon, Oct 5, 2009 at 14:55, Sebastian Haase wrote: > On Mon, Oct 5, 2009 at 8:43 PM, Robert Kern wrote: >> On Mon, Oct 5, 2009 at 13:37, Sebastian Haase wrote: >>> Thanks for the reply. >>> I thought one reason for amax was that >>> from numpy import * >>> would not not import max but only amax. >> >> I have my timelines confused. Numeric has neither amax() nor max(). I >> don't actually recall the sequence of events, then. >> >>> How about sum ? >>> Does "from numpy import *" >>> overwrite the builtin sum ? >> >> Try it. >> >>>> sum > >>>> from numpy import * >>>> sum > >>>> asum > Traceback (most recent call last): > ?File "", line 1, in > NameError: name 'asum' is not defined >>>> N.__version__ > '1.3.0' >>>> > >>> not to mention the "symmetry" / consistency argument for having "asum" ? >> >> At this point, I don't care to cater to "from numpy import *" use >> case. Too much code uses numpy.sum() remove it, or even deprecate it. >> > > I did not mean to suggest to remove or deprecate it. I only remember > that there was a discussion - long time ago - that "from numpy import > *" (still common in many places, like interactive sessions) - should > not overwrite builtins .... We are not removing sum from numpy.__all__ at this point in time. It's too late. > Personally, I would prefer to write np.amax and np.asum ... do you see > my argument for consistency here ? Yes, but it's not important enough to me to want to introduce more aliases. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pav at iki.fi Mon Oct 5 16:10:07 2009 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 05 Oct 2009 23:10:07 +0300 Subject: [Numpy-discussion] Easy way to test documentation? In-Reply-To: References: Message-ID: <1254773407.6463.6.camel@idol> ma, 2009-10-05 kello 13:54 -0600, Charles R Harris kirjoitti: > Is there an easy way to test build documentation for a module that is > not yet part of numpy? Make a small Sphinx project for that: $ easy_install numpydoc $ mkdir foo $ cd foo $ sphinx-quickstart ... $ vi conf.py ... add 'sphinx.ext.autodoc', 'numpydoc' to extensions ... $ cp /some/path/modulename.py modulename.py $ vi index.rst ... add .. automodule:: modulename :members: ... $ make PYTHONPATH=$PWD html Could be automated. -- Pauli Virtanen From elaine.angelino at gmail.com Mon Oct 5 17:22:54 2009 From: elaine.angelino at gmail.com (Elaine Angelino) Date: Mon, 5 Oct 2009 17:22:54 -0400 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> Message-ID: <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> Hi there, We are writing to announce the release of "Tabular", a package of Python modules for working with tabular data. Tabular is a package of Python modules for working with tabular data. Its main object is the tabarray class, a data structure for holding and manipulating tabular data. By putting data into a tabarray object, you?ll get a representation of the data that is more flexible and powerful than a native Python representation. More specifically, tabarray provides: -- ultra-fast filtering, selection, and numerical analysis methods, using convenient Matlab-style matrix operation syntax -- spreadsheet-style operations, including row & column operations, 'sort', 'replace', 'aggregate', 'pivot', and 'join' -- flexible load and save methods for a variety of file formats, including delimited text (CSV), binary, and HTML -- helpful inference algorithms for determining formatting parameters and data types of input files -- support for hierarchical groupings of columns, both as data structures and file formats You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/) or alternatively clone our hg repository from bitbucket ( http://bitbucket.org/elaine/tabular/ ). We also have posted tutorial-style Sphinx documentation ( http://www.parsemydata.com/tabular/). The tabarray object is based on the record arrayobject from the Numerical Python package ( NumPy ), and Tabular is built to interface well with NumPy in general. Our intended audience is two-fold: (1) Python users who, though they may not be familiar with NumPy, are in need of a way to work with tabular data, and (2) NumPy users who would like to do spreadsheet-style operations on top of their more "numerical" work. We hope that some of you find Tabular useful! Best, Elaine and Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Mon Oct 5 17:34:40 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 5 Oct 2009 17:34:40 -0400 Subject: [Numpy-discussion] A numpy accumulator... In-Reply-To: <4ACA35A3.3030707@noaa.gov> References: <4AC705F4.1040702@noaa.gov> <200910051053.05259.faltet@pytables.org> <4ACA35A3.3030707@noaa.gov> Message-ID: 2009/10/5 Christopher Barker : > Francesc Alted wrote: >> A Saturday 03 October 2009 10:06:12 Christopher Barker escrigu?: >>> This idea was inspired by a discussion at the SciPy conference, in which >>> we spent a LOT of time during the numpy tutorial talking about how to >>> accumulate values in an array when you don't know how big the array >>> needs to be when you start. > >>> What I have in mind is very simple. It would be: >>> ? ?- Only 1-d >>> ? ?- Support append() and extend() methods >>> ? ?- support indexing and slicing >>> ? ?- Support any valid numpy dtype >>> ? ? ?- which could even get you pseudo n-d arrays... >>> ? ?- maybe it would act like an array in other ways, I'm not so sure. >>> ? ? ?- ufuncs, etc. > >> That's interesting. ?I'd normally use the `resize()` method for what you want, >> but indeed your approach is way more easy-to-use. > > Of course, this is using resize() under the hood, but giving it an > easier interface, but more importantly, it's adding the pre-allocation > for you, and the code to deal with that. I suppose I should benchmark > it, but I think calling resize(0 with every append would be a lot slower > (though maybe not -- might the compiler/os be pre-allocating some extra > memory anyway?) I looked into this at some point, and under Linux, the malloc doesn't allocate substantial extra memory until you get big enough that it's allocating complete memory pages, at which point you get until the end of the page. At this point it's possible that adding more memory onto the end of the malloced region (and maybe even moving the array around in memory) can become really cheap, since it's just requesting more memory from the OS. Also, a friend who's a bare-metal programming wizard pointed out to me that modern malloc implementations really hate realloc, since it tends to put memory blocks in arenas intended for different sizes. I think that's only really an issue for shrinking blocks, since they probably just always allocate a new block when growing (unless they're in the pages-at-a-time regime). In short, I think it's better to have a python-list-like growing scheme. In fact it's maybe more important for arrays than python lists, since in a python list all that needs to be moved are pointers to the actual python objects, only ever a small fraction of the data volume. > I should profile this -- if you can call resize() with every new item, > and it's not too slow, then it may not be worth writing this class at > all (or I could make it simpler, maybe even an nd-array subclass instead. Keep in mind the need for sensible handling of slices, since the underlying array will probably move on every resize. I think there's a need for this code. >> If you are looking for performance improvements, I'd have a look at the >> `PyArray_Resize()` function in 'core/src/multiarray/shape.c' (trunk). ?It >> seems to me that the zero-initialization of added memory can be skipped, >> allowing for more performance for the `resize()` method (most specially for >> large size increments). > > I suppose so, but I doubt that's causing any of my performance issues. > Another thing to profile. Probably worth profiling, yes - I wouldn't worry about the time taken writing zeros, but that does mean you have to touch all the allocated memory, which can't be too great for the cache. Anne From pgmdevlist at gmail.com Mon Oct 5 17:47:12 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 5 Oct 2009 17:47:12 -0400 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> Message-ID: <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> Ciao Elaine, I just quickly browsed through your code. Say, what's the reason behind using np.recarrays instead of just standard ndarrays (with flexible dtype). Do you really need the overhead of accessing fields as attributes ? It looks like you're always accessing fields as items... Cheers P. On Oct 5, 2009, at 5:22 PM, Elaine Angelino wrote: > Hi there, > > We are writing to announce the release of "Tabular", a package of > Python modules for working with tabular data. > > Tabular is a package of Python modules for working with tabular > data. Its main object is the tabarray class, a data structure for > holding and manipulating tabular data. By putting data into a > tabarray object, you?ll get a representation of the data that is > more flexible and powerful than a native Python representation. More > specifically, tabarray provides: > > -- ultra-fast filtering, selection, and numerical analysis methods, > using convenient Matlab-style matrix operation syntax > -- spreadsheet-style operations, including row & column operations, > 'sort', 'replace', 'aggregate', 'pivot', and 'join' > -- flexible load and save methods for a variety of file formats, > including delimited text (CSV), binary, and HTML > -- helpful inference algorithms for determining formatting > parameters and data types of input files > -- support for hierarchical groupings of columns, both as data > structures and file formats > > You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/ > ) or alternatively clone our hg repository from bitbucket (http://bitbucket.org/elaine/tabular/ > ). We also have posted tutorial-style Sphinx documentation (http://www.parsemydata.com/tabular/ > ). > > The tabarray object is based on the record array object from the > Numerical Python package (NumPy), and Tabular is built to interface > well with NumPy in general. Our intended audience is two-fold: (1) > Python users who, though they may not be familiar with NumPy, are in > need of a way to work with tabular data, and (2) NumPy users who > would like to do spreadsheet-style operations on top of their more > "numerical" work. > > We hope that some of you find Tabular useful! > > Best, > > Elaine and Dan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pgmdevlist at gmail.com Mon Oct 5 18:03:35 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 5 Oct 2009 18:03:35 -0400 Subject: [Numpy-discussion] What Python version are we supporting ? Message-ID: <1CDC6C62-CCE9-4F9F-84FB-D29C311ECFC7@gmail.com> All, What Python version are we supporting in 1.4.0dev ? 2.4 still ? For which version of numpy will we be moving to a more recent one ? Thx in advance P. From robert.kern at gmail.com Mon Oct 5 18:13:12 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Oct 2009 17:13:12 -0500 Subject: [Numpy-discussion] What Python version are we supporting ? In-Reply-To: <1CDC6C62-CCE9-4F9F-84FB-D29C311ECFC7@gmail.com> References: <1CDC6C62-CCE9-4F9F-84FB-D29C311ECFC7@gmail.com> Message-ID: <3d375d730910051513i57d74cbcn3ea7769fc22d02ec@mail.gmail.com> On Mon, Oct 5, 2009 at 17:03, Pierre GM wrote: > All, > What Python version are we supporting in 1.4.0dev ? 2.4 still ? Yes. > For > which version of numpy will we be moving to a more recent one ? There is no plan in place to change this requirement. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From elaine.angelino at gmail.com Mon Oct 5 18:16:42 2009 From: elaine.angelino at gmail.com (Elaine Angelino) Date: Mon, 5 Oct 2009 18:16:42 -0400 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> Message-ID: <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> hey pierre -- good question. this is something we debated a while ago (we actually sent a couple of emails over the numpy list about this very topic) when coming up with our design. at the time, there did not seem to be strong opinions either way about using ndarray vs. recarray the main reason we went with the recarray over the ndarray is because the recarray has a couple of useful construction functions (e.g. np.rec.fromrecords and np.rec.fromarrays). not only are these functions convenient to use, they have nice data type inference properties which we'd have to rebuild ourselves if we wanted to avoid recarrays entirely. It would be fairly straightforward to switch from recarray to ndarray if this were really an important thing to do (e.g. if recarray were being deprecated or if most NumPy people have strong feelings about this), and doing so wouldn't modify anything about the tabarray API. elaine On Mon, Oct 5, 2009 at 5:47 PM, Pierre GM wrote: > Ciao Elaine, > I just quickly browsed through your code. Say, what's the reason > behind using np.recarrays instead of just standard ndarrays (with > flexible dtype). Do you really need the overhead of accessing fields > as attributes ? It looks like you're always accessing fields as items... > Cheers > P. > > > > On Oct 5, 2009, at 5:22 PM, Elaine Angelino wrote: > > > Hi there, > > > > We are writing to announce the release of "Tabular", a package of > > Python modules for working with tabular data. > > > > Tabular is a package of Python modules for working with tabular > > data. Its main object is the tabarray class, a data structure for > > holding and manipulating tabular data. By putting data into a > > tabarray object, you?ll get a representation of the data that is > > more flexible and powerful than a native Python representation. More > > specifically, tabarray provides: > > > > -- ultra-fast filtering, selection, and numerical analysis methods, > > using convenient Matlab-style matrix operation syntax > > -- spreadsheet-style operations, including row & column operations, > > 'sort', 'replace', 'aggregate', 'pivot', and 'join' > > -- flexible load and save methods for a variety of file formats, > > including delimited text (CSV), binary, and HTML > > -- helpful inference algorithms for determining formatting > > parameters and data types of input files > > -- support for hierarchical groupings of columns, both as data > > structures and file formats > > > > You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/ > > ) or alternatively clone our hg repository from bitbucket ( > http://bitbucket.org/elaine/tabular/ > > ). We also have posted tutorial-style Sphinx documentation ( > http://www.parsemydata.com/tabular/ > > ). > > > > The tabarray object is based on the record array object from the > > Numerical Python package (NumPy), and Tabular is built to interface > > well with NumPy in general. Our intended audience is two-fold: (1) > > Python users who, though they may not be familiar with NumPy, are in > > need of a way to work with tabular data, and (2) NumPy users who > > would like to do spreadsheet-style operations on top of their more > > "numerical" work. > > > > We hope that some of you find Tabular useful! > > > > Best, > > > > Elaine and Dan > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Oct 5 18:36:11 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Oct 2009 17:36:11 -0500 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> Message-ID: <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> On Mon, Oct 5, 2009 at 17:16, Elaine Angelino wrote: > hey pierre -- good question. this is something we debated a while ago (we > actually sent a couple of emails over the numpy list about this very topic) > when coming up with our design.? at the time, there did not seem to be > strong opinions either way about using ndarray vs. recarray > > the main reason we went with the recarray over the ndarray is because the > recarray has a couple of useful construction functions (e.g. > np.rec.fromrecords and np.rec.fromarrays).? not only are these functions > convenient to use, they have nice data type inference properties which we'd > have to rebuild ourselves if we wanted to avoid recarrays entirely. Try np.rec.fromrecords(...).view(np.ndarray). Most likely, we should have versions of those functions that return plain ndarrays. They are quite useful. Perhaps def fromarrays(..., type=None): ... if type is not None: _array = _array.view(type) return _array -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From elaine.angelino at gmail.com Mon Oct 5 18:52:47 2009 From: elaine.angelino at gmail.com (Elaine Angelino) Date: Mon, 5 Oct 2009 18:52:47 -0400 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> Message-ID: <901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com> On Mon, Oct 5, 2009 at 6:36 PM, Robert Kern wrote: > > > > the main reason we went with the recarray over the ndarray is because the > > recarray has a couple of useful construction functions (e.g. > > np.rec.fromrecords and np.rec.fromarrays). not only are these functions > > convenient to use, they have nice data type inference properties which > we'd > > have to rebuild ourselves if we wanted to avoid recarrays entirely. > > Try np.rec.fromrecords(...).view(np.ndarray). > > Hi Robert, thanks your email. We definitely understand this use of .view(). However, our question is, should we have implemented tabular this way, e.g. in the tabarray constructor, first make a recarray and then view it as an ndarray? (and then of course view it as a tabarray). This would have the effect of eliminating the extra recarray functionality, and some if its overhead as well. Is this the desirable design, or should we stick with recarrays? (Also, is first casting to recarrays and then viewing as ndarrays more expensive than if we went through ndarray directly?) > Most likely, we should have versions of those functions that return > plain ndarrays. They are quite useful. > > Perhaps > > def fromarrays(..., type=None): > ... > if type is not None: > _array = _array.view(type) > return _array > > Yes, we definitely agree with you that there should be plain ndarray versions of the fromarrays and fromrecords constructors. The only reason we didn't include a function like your "fromarrays" function in tabular is that we thought it might be a bit hackish for our package, and seemed like something to be addressed by numpy directly, perhaps at a later time. This was especially given that it didn't seem like people hated recarrays especially. In the event that people really think we should switch "tabular" from using ndarrays to recarrays, we would definitely support a discussion of adding these kinds of constructors directly to ndarrays. Thanks Elaine -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Oct 5 18:58:34 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Oct 2009 17:58:34 -0500 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> <901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com> Message-ID: <3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com> On Mon, Oct 5, 2009 at 17:52, Elaine Angelino wrote: > On Mon, Oct 5, 2009 at 6:36 PM, Robert Kern wrote: > >> > the main reason we went with the recarray over the ndarray is because >> > the >> > recarray has a couple of useful construction functions (e.g. >> > np.rec.fromrecords and np.rec.fromarrays).? not only are these functions >> > convenient to use, they have nice data type inference properties which >> > we'd >> > have to rebuild ourselves if we wanted to avoid recarrays entirely. >> >> Try np.rec.fromrecords(...).view(np.ndarray). >> > > Hi Robert, thanks your email.? We definitely understand this use of > .view().? However,? our question is,? should we have implemented tabular > this way, e.g. in the tabarray constructor, first make a recarray and then > view it as an ndarray?? (and then of course view it as a tabarray). Do the minimum number of .view()s that you can get away with. > This > would have the effect of eliminating the extra recarray functionality, and > some if its overhead as well. Is this the desirable design, or should we > stick with recarrays? Well, what other recarray functionality are you using? I addressed the from*() functions because you said it was the main reason. What are your other reasons? > (Also, is first casting to recarrays and then viewing as ndarrays more > expensive than if we went through ndarray directly?) The overhead should be miniscule. No data is converted. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From elaine.angelino at gmail.com Mon Oct 5 19:15:44 2009 From: elaine.angelino at gmail.com (Elaine Angelino) Date: Mon, 5 Oct 2009 19:15:44 -0400 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> <901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com> <3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com> Message-ID: <901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com> Do the minimum number of .view()s that you can get away with. > > I guess our bottom line is that we're still not 100% clear as to the recommendation of the NumPy community regarding whether we should use recarray or ndarray. It seems like recarray has some advantages (e.g. the nice inference functions/constructors, and the fact that some people like the ability to fields as attributes) as well as some disadvantages (e.g. the overhead). it definitely wouldn't be much difficulty to convert tabular to using ndarrays, but is it very desirable? Of course if we were to do this, having recarray-style constructors for ndarrays directly in Numpy would be seem to be a "cleaner" way to do things than either writing our own ndarray versions or casting from recarray to ndarray, but we're happy to do either if changing tabular to ndarray is really desirable. > > > Well, what other recarray functionality are you using? None, in our code. We also thought that since at least some people like using the attribute reference property, perhaps users of tabarrays might too (though we don't personally in our own work) Recarrays still seemed to be being supported by NumPy, so it seemed to make sense to use them. but the only functional thing in our code are those constructors. > > > (Also, is first casting to recarrays and then viewing as ndarrays more > > expensive than if we went through ndarray directly?) > > But if NumPy decided to include ndarray versions of the from*() constructors in the distribution, would this be achieved by first using the recarray constructor and then viewing as ndarray? Or would something more "direct" be done? thanks, e -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Oct 5 19:20:35 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Oct 2009 18:20:35 -0500 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> <901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com> <3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com> <901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com> Message-ID: <3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com> On Mon, Oct 5, 2009 at 18:15, Elaine Angelino wrote: >> Well, what other recarray functionality are you using? > > None, in our code.?? We also thought that since at least some people like > using the attribute reference property, perhaps users of tabarrays might too > (though we don't personally in our own work) ? Recarrays still seemed to be > being supported by NumPy, so it seemed to make sense to use them.?? but the > only functional thing in our code are those constructors. Then I would suggest making tabarrays subclass from ndarray. If you like, provide a tabrecarray that subclasses from both recarray and tabarray so that people who like attribute access can .view() to their heart's content. >> > (Also, is first casting to recarrays and then viewing as ndarrays more >> > expensive than if we went through ndarray directly?) >> > > But if NumPy decided to include ndarray versions of the from*() constructors > in the distribution, would this be achieved by first using the recarray > constructor and then viewing as ndarray?? Or would something more "direct" > be done? We would fix the functions to not do any unnecessary .view()s. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Mon Oct 5 20:57:27 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 5 Oct 2009 18:57:27 -0600 Subject: [Numpy-discussion] Easy way to test documentation? In-Reply-To: <1254773407.6463.6.camel@idol> References: <1254773407.6463.6.camel@idol> Message-ID: On Mon, Oct 5, 2009 at 2:10 PM, Pauli Virtanen wrote: > ma, 2009-10-05 kello 13:54 -0600, Charles R Harris kirjoitti: > > Is there an easy way to test build documentation for a module that is > > not yet part of numpy? > > Make a small Sphinx project for that: > > $ easy_install numpydoc > $ mkdir foo > $ cd foo > $ sphinx-quickstart > What to choose for math rendering? Defaults for everything else? > ... > $ vi conf.py > ... add 'sphinx.ext.autodoc', 'numpydoc' to extensions ... > $ cp /some/path/modulename.py modulename.py > $ vi index.rst > index.py, right? > ... > add > .. automodule:: modulename > :members: > ... > $ make PYTHONPATH=$PWD html > > Bombs when it hits the first Parameters section: "Unexpected section title." Could be automated. > > That would be nice. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Oct 5 21:27:40 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 5 Oct 2009 19:27:40 -0600 Subject: [Numpy-discussion] Easy way to test documentation? In-Reply-To: References: <1254773407.6463.6.camel@idol> Message-ID: On Mon, Oct 5, 2009 at 6:57 PM, Charles R Harris wrote: > > > On Mon, Oct 5, 2009 at 2:10 PM, Pauli Virtanen wrote: > >> ma, 2009-10-05 kello 13:54 -0600, Charles R Harris kirjoitti: >> > Is there an easy way to test build documentation for a module that is >> > not yet part of numpy? >> >> Make a small Sphinx project for that: >> >> $ easy_install numpydoc >> $ mkdir foo >> $ cd foo >> $ sphinx-quickstart >> > > What to choose for math rendering? Defaults for everything else? > > >> ... >> $ vi conf.py >> ... add 'sphinx.ext.autodoc', 'numpydoc' to extensions ... >> $ cp /some/path/modulename.py modulename.py >> $ vi index.rst >> > > index.py, right? > > OK, had to choose file type (txt/rst) > ... >> add >> > append > .. automodule:: modulename >> :members: >> ... >> $ make PYTHONPATH=$PWD html >> >> > Seems to work. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ondrej at certik.cz Mon Oct 5 21:40:46 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Mon, 5 Oct 2009 18:40:46 -0700 Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults Message-ID: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com> Hi, I am getting a segfault in PyArray_SimpleNewFromData in Cython. I am trying to debug it for the last 4 hours, but still absolutely no clue, so I am posting it here, maybe someone knows where the problem is: cdef ndarray array_double_c2numpy(double *A, int len): from numpy import empty print "got len:", len cdef npy_intp dims[10] cdef double X[500] print "1" dims[0] = 3 print "2x" print dims[0], len print X[0], X[1], X[2] cdef npy_intp size cdef ndarray newarr cdef double *arrsource size = 10 arrsource = malloc(sizeof(double) * size) print "still alive" newarr = PyArray_SimpleNewFromData(1, &size, 12, arrsource) print "I am already dead. :(" print "3" return empty([len]) Essential is just the line: newarr = PyArray_SimpleNewFromData(1, &size, 12, arrsource) Then I removed all numpy from my computer, downloaded the latest git repository from: http://projects.scipy.org/git/numpy.git applied the following patch: diff --git a/numpy/core/src/multiarray/ctors.c b/numpy/core/src/multiarray/ctors index 3fdded0..777563c 100644 --- a/numpy/core/src/multiarray/ctors.c +++ b/numpy/core/src/multiarray/ctors.c @@ -1318,6 +1318,7 @@ PyArray_NewFromDescr(PyTypeObject *subtype, PyArray_Descr intp *dims, intp *strides, void *data, int flags, PyObject *obj) { + printf("entering PyArray_NewFromDescr\n"); PyArrayObject *self; int i; size_t sd; @@ -1553,6 +1554,7 @@ PyArray_New(PyTypeObject *subtype, int nd, intp *dims, int { PyArray_Descr *descr; PyObject *new; + printf("entering PyArray_New, still kicking\n"); descr = PyArray_DescrFromType(type_num); if (descr == NULL) { then installed with: python setup.py install --home=~/usr and run my cython program. Here is the output: $ ./schroedinger ------------------------------------------- This is Hermes1D - a free ODE solver based on the hp-FEM and Newton's method, developed by the hp-FEM group at UNR and distributed under the BSD license. For more details visit http://hpfem.org/. ------------------------------------------- Importing hermes1d entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_New, still kicking entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_New, still kicking entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr entering PyArray_NewFromDescr Python initialized got len: 39601 1 2x 3 39601 0.0 0.0 0.0 still alive Segmentation fault What puzzles me is that there is no debugging print statement just before the segfault. So like if the PyArray_New was not being called. But looking into numpy/core/include/numpy/ndarrayobject.h, line 1359: #define PyArray_SimpleNewFromData(nd, dims, typenum, data) \ PyArray_New(&PyArray_Type, nd, dims, typenum, NULL, \ data, 0, NPY_CARRAY, NULL) It should be called. Does it segfault in the printf() statement above? Hm. I also tried gdb, but it doesn't step into PyArray_SimpleNewFromData (in the C file), not sure why. So both print statements and gdb failed to bring me to the cause, pretty sad day for me. I am going home now and start with a fresh head, it just can't segfault like this... I guess I'll start by creating a simple cython project to reproduce it (the schroedinger code above is quite involved, it starts a python interpreter inside a C++ program, etc. etc.). Ondrej From charlesr.harris at gmail.com Mon Oct 5 22:34:48 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 5 Oct 2009 20:34:48 -0600 Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults In-Reply-To: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com> References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com> Message-ID: On Mon, Oct 5, 2009 at 7:40 PM, Ondrej Certik wrote: > Hi, > > I am getting a segfault in PyArray_SimpleNewFromData in Cython. I am > trying to debug it for the last 4 hours, but still absolutely no clue, > so I am posting it here, maybe someone knows where the problem is: > > cdef ndarray array_double_c2numpy(double *A, int len): > from numpy import empty > print "got len:", len > cdef npy_intp dims[10] > cdef double X[500] > print "1" > dims[0] = 3 > print "2x" > print dims[0], len > print X[0], X[1], X[2] > cdef npy_intp size > cdef ndarray newarr > cdef double *arrsource > > size = 10 > arrsource = malloc(sizeof(double) * size) > print "still alive" > newarr = PyArray_SimpleNewFromData(1, &size, 12, > arrsource) > print "I am already dead. :(" > print "3" > return empty([len]) > > > > Essential is just the line: > > newarr = PyArray_SimpleNewFromData(1, &size, 12, > arrsource) > > Then I removed all numpy from my computer, downloaded the latest git > repository from: > > http://projects.scipy.org/git/numpy.git > > applied the following patch: > > diff --git a/numpy/core/src/multiarray/ctors.c > b/numpy/core/src/multiarray/ctors > index 3fdded0..777563c 100644 > --- a/numpy/core/src/multiarray/ctors.c > +++ b/numpy/core/src/multiarray/ctors.c > @@ -1318,6 +1318,7 @@ PyArray_NewFromDescr(PyTypeObject *subtype, > PyArray_Descr > intp *dims, intp *strides, void *data, > int flags, PyObject *obj) > { > + printf("entering PyArray_NewFromDescr\n"); > PyArrayObject *self; > int i; > size_t sd; > @@ -1553,6 +1554,7 @@ PyArray_New(PyTypeObject *subtype, int nd, intp > *dims, int > { > PyArray_Descr *descr; > PyObject *new; > + printf("entering PyArray_New, still kicking\n"); > > descr = PyArray_DescrFromType(type_num); > if (descr == NULL) { > > > > then installed with: > > python setup.py install --home=~/usr > > and run my cython program. Here is the output: > > $ ./schroedinger > > ------------------------------------------- > This is Hermes1D - a free ODE solver > based on the hp-FEM and Newton's method, > developed by the hp-FEM group at UNR > and distributed under the BSD license. > For more details visit http://hpfem.org/. > ------------------------------------------- > Importing hermes1d > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_New, still kicking > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_New, still kicking > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > entering PyArray_NewFromDescr > Python initialized > got len: 39601 > 1 > 2x > 3 39601 > 0.0 0.0 0.0 > still alive > Segmentation fault > > > > What puzzles me is that there is no debugging print statement just > before the segfault. Maybe you need to flush the buffer. That is a good thing to do when segfaults are about. > So like if the PyArray_New was not being called. > But looking into numpy/core/include/numpy/ndarrayobject.h, line 1359: > > #define PyArray_SimpleNewFromData(nd, dims, typenum, data) > \ > PyArray_New(&PyArray_Type, nd, dims, typenum, NULL, > \ > data, 0, NPY_CARRAY, NULL) > > > It should be called. Does it segfault in the printf() statement above? > Hm. I also tried gdb, but it doesn't step into > PyArray_SimpleNewFromData (in the C file), not sure why. > > So both print statements and gdb failed to bring me to the cause, > pretty sad day for me. I am going home now and start with a fresh > head, it just can't segfault like this... I guess I'll start by > creating a simple cython project to reproduce it the schroedinger > code above is quite involved, it starts a python interpreter inside a > C++ program, etc. etc.). > > > Ondrej > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpk at kraussfamily.org Mon Oct 5 22:49:15 2009 From: tpk at kraussfamily.org (Tom K.) Date: Mon, 5 Oct 2009 19:49:15 -0700 (PDT) Subject: [Numpy-discussion] A numpy accumulator... In-Reply-To: <4AC705F4.1040702@noaa.gov> References: <4AC705F4.1040702@noaa.gov> Message-ID: <25762136.post@talk.nabble.com> Christopher Barker wrote: > > > What do folks think? is this useful? What would you change, etc? > Chris - I really like this and find it useful. I would change the name to something like "growable" or "ArrayList" - accumulator seems like an object for cumulative summation. I think the right amount to grow is 2x - this provides an amortized O(log n) append. If the array doesn't have to grow, the cost is 1 - no copies - whereas if you have to grow, the cost is n copies. Is 2x optimal? Perhaps the configurable grow ratio is a good thing, although giving a knob means people are going to set it wrong. I would also vote "+1" for an ND version of this (growing only a single dimension). Keeping 2x for each of n dimensions, while conceivable, would be 2**n extra memory, and hence probably too costly. Cheers, Tom K. -- View this message in context: http://www.nabble.com/A-numpy-accumulator...-tp25726568p25762136.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From ondrej at certik.cz Mon Oct 5 23:38:18 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Mon, 5 Oct 2009 20:38:18 -0700 Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults In-Reply-To: References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com> Message-ID: <85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com> On Mon, Oct 5, 2009 at 7:34 PM, Charles R Harris wrote: > > > On Mon, Oct 5, 2009 at 7:40 PM, Ondrej Certik wrote: [...] >> still alive >> Segmentation fault >> >> >> >> What puzzles me is that there is no debugging print statement just >> before the segfault. > > Maybe you need to flush the buffer. That is a good thing to do when > segfaults are about. I tried to put "fflush(NULL);" after it, but it didn't help. I have created a super simple demo for anyone to play: $ git clone git://github.com/certik/segfault.git $ cd segfault/ $ vim Makefile # <-- edit the python and numpy include paths $ make $ python test.py I am still alive Segmentation fault where test.py is: $ cat test.py import _hermes1d v = _hermes1d.test() print v and _hermes1d.pyx is: $ cat _hermes1d.pyx def test(): cdef npy_intp size cdef ndarray newarr cdef double *arrsource size = 10 arrsource = malloc(sizeof(double) * size) print "I am still alive" newarr = PyArray_SimpleNewFromData(1, &size, NPY_DOUBLE, arrsource) print "I am dead." return newarr So I bet there is something very stupid that I am missing. Still investigating... Ondrej From ondrej at certik.cz Tue Oct 6 00:25:48 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Mon, 5 Oct 2009 21:25:48 -0700 Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults In-Reply-To: <85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com> References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com> <85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com> Message-ID: <85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com> On Mon, Oct 5, 2009 at 8:38 PM, Ondrej Certik wrote: > On Mon, Oct 5, 2009 at 7:34 PM, Charles R Harris > wrote: >> >> >> On Mon, Oct 5, 2009 at 7:40 PM, Ondrej Certik wrote: > [...] >>> still alive >>> Segmentation fault >>> >>> >>> >>> What puzzles me is that there is no debugging print statement just >>> before the segfault. >> >> Maybe you need to flush the buffer. That is a good thing to do when >> segfaults are about. > > I tried to put "fflush(NULL);" after it, but it didn't help. I have > created a super simple demo for anyone to play: > > > $ git clone git://github.com/certik/segfault.git > $ cd segfault/ > $ vim Makefile ? ? # <-- edit the python and numpy include paths > $ make > $ python test.py > I am still alive > Segmentation fault > > where test.py is: > > $ cat test.py > import _hermes1d > v = _hermes1d.test() > print v > > > and _hermes1d.pyx is: > > $ cat _hermes1d.pyx > def test(): > ? ?cdef npy_intp size > ? ?cdef ndarray newarr > ? ?cdef double *arrsource > > ? ?size = 10 > ? ?arrsource = malloc(sizeof(double) * size) > ? ?print "I am still alive" > ? ?newarr = PyArray_SimpleNewFromData(1, &size, NPY_DOUBLE, arrsource) > ? ?print "I am dead." > > ? ?return newarr > > > So I bet there is something very stupid that I am missing. Still > investigating... I didn't call _import_array() ! This patch fixes it: diff --git a/_hermes1d.pxd b/_hermes1d.pxd index 9994c28..f5e8868 100644 --- a/_hermes1d.pxd +++ b/_hermes1d.pxd @@ -54,6 +54,8 @@ cdef extern from "arrayobject.h": object PyArray_SimpleNewFromData(int nd, npy_intp* dims, int typenum, void* data) + void _import_array() + cdef extern from "Python.h": ctypedef void PyObject void Py_INCREF(PyObject *x) diff --git a/_hermes1d.pyx b/_hermes1d.pyx index e542ddc..7a4beec 100644 --- a/_hermes1d.pyx +++ b/_hermes1d.pyx @@ -2,6 +2,7 @@ def test(): cdef npy_intp size cdef ndarray newarr cdef double *arrsource + _import_array() size = 10 arrsource = malloc(sizeof(double) * size) I think I learned something today the hard way. Ondrej From ondrej at certik.cz Tue Oct 6 00:34:40 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Mon, 5 Oct 2009 21:34:40 -0700 Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults In-Reply-To: <85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com> References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com> <85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com> <85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com> Message-ID: <85b5c3130910052134ha25ce0ge45600990dbf5b2d@mail.gmail.com> On Mon, Oct 5, 2009 at 9:25 PM, Ondrej Certik wrote: > On Mon, Oct 5, 2009 at 8:38 PM, Ondrej Certik wrote: >> On Mon, Oct 5, 2009 at 7:34 PM, Charles R Harris >> wrote: >>> >>> >>> On Mon, Oct 5, 2009 at 7:40 PM, Ondrej Certik wrote: >> [...] >>>> still alive >>>> Segmentation fault >>>> >>>> >>>> >>>> What puzzles me is that there is no debugging print statement just >>>> before the segfault. >>> >>> Maybe you need to flush the buffer. That is a good thing to do when >>> segfaults are about. >> >> I tried to put "fflush(NULL);" after it, but it didn't help. I have >> created a super simple demo for anyone to play: >> >> >> $ git clone git://github.com/certik/segfault.git >> $ cd segfault/ >> $ vim Makefile ? ? # <-- edit the python and numpy include paths >> $ make >> $ python test.py >> I am still alive >> Segmentation fault >> >> where test.py is: >> >> $ cat test.py >> import _hermes1d >> v = _hermes1d.test() >> print v >> >> >> and _hermes1d.pyx is: >> >> $ cat _hermes1d.pyx >> def test(): >> ? ?cdef npy_intp size >> ? ?cdef ndarray newarr >> ? ?cdef double *arrsource >> >> ? ?size = 10 >> ? ?arrsource = malloc(sizeof(double) * size) >> ? ?print "I am still alive" >> ? ?newarr = PyArray_SimpleNewFromData(1, &size, NPY_DOUBLE, arrsource) >> ? ?print "I am dead." >> >> ? ?return newarr >> >> >> So I bet there is something very stupid that I am missing. Still >> investigating... > > I didn't call _import_array() ?! > > This patch fixes it: > > > diff --git a/_hermes1d.pxd b/_hermes1d.pxd > index 9994c28..f5e8868 100644 > --- a/_hermes1d.pxd > +++ b/_hermes1d.pxd > @@ -54,6 +54,8 @@ cdef extern from "arrayobject.h": > ? ? object PyArray_SimpleNewFromData(int nd, npy_intp* dims, int typenum, > ? ? ? ? ? ? void* data) > > + ? ?void _import_array() > + > ?cdef extern from "Python.h": > ? ? ctypedef void PyObject > ? ? void Py_INCREF(PyObject *x) > diff --git a/_hermes1d.pyx b/_hermes1d.pyx > index e542ddc..7a4beec 100644 > --- a/_hermes1d.pyx > +++ b/_hermes1d.pyx > @@ -2,6 +2,7 @@ def test(): > ? ? cdef npy_intp size > ? ? cdef ndarray newarr > ? ? cdef double *arrsource > + ? ?_import_array() > > ? ? size = 10 > ? ? arrsource = malloc(sizeof(double) * size) > > > > > I think I learned something today the hard way. The only mention of the _import_array() in the documentation that I found is here: http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NO_IMPORT_ARRAY but I don't understand what it means ---- do I have to just call _import_array() and then I can use numpy CAPI, or do I also have to define those PY_ARRAY_UNIQUE_SYMBOL etc? Btw, to explain my original post for future readers --- the real problem was that PyArray_Type was NULL and thus &PyArray_Type segfaulted. That happened in the definition: #define PyArray_SimpleNewFromData(nd, dims, typenum, data) \ PyArray_New(&PyArray_Type, nd, dims, typenum, NULL, \ data, 0, NPY_CARRAY, NULL) so it is *extremely* confusing, since PyArray_SimpleNewFromData() was being called from my code, but PyArray_New never started, it segfaulted in between. I think now it is clear what is going on. Only I don't understand the intention, but I can now get my job done. Ondrej From robert.kern at gmail.com Tue Oct 6 00:42:16 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 5 Oct 2009 23:42:16 -0500 Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults In-Reply-To: <85b5c3130910052134ha25ce0ge45600990dbf5b2d@mail.gmail.com> References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com> <85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com> <85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com> <85b5c3130910052134ha25ce0ge45600990dbf5b2d@mail.gmail.com> Message-ID: <3d375d730910052142s4a66164bn534d7e710d99b80b@mail.gmail.com> On Mon, Oct 5, 2009 at 23:34, Ondrej Certik wrote: > The only mention of the _import_array() in the documentation that I > found is here: > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NO_IMPORT_ARRAY > > > but I don't understand what it means ---- do I have to just call > _import_array() and then I can use numpy CAPI, or do I also have to > define those PY_ARRAY_UNIQUE_SYMBOL etc? Not _import_array() but import_array(). http://docs.scipy.org/doc/numpy/reference/c-api.array.html#importing-the-api You don't have multiple files, so you only use import_array() and not PY_ARRAY_UNIQUE_SYMBOL or NO_IMPORT_ARRAY. I'm not really sure what is unclear about the text except that you searched for the wrong spelling and found the wrong entry. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Tue Oct 6 01:15:31 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 05 Oct 2009 22:15:31 -0700 Subject: [Numpy-discussion] A numpy accumulator... In-Reply-To: <25762136.post@talk.nabble.com> References: <4AC705F4.1040702@noaa.gov> <25762136.post@talk.nabble.com> Message-ID: <4ACAD273.4040000@noaa.gov> Tom K. wrote: > Chris - I really like this and find it useful. I would change the name to > something like "growable" or "ArrayList" hmm. I think I like "growable" or maybe "growarray". > I think the right amount to grow is 2x - I think that may be too much.. one if the key advantages of this over python lists is that there should be a memory use advantage -- when you are pushing memory bounds, using twice what you need is a bit much. > Perhaps the configurable grow ratio is a good > thing, although giving a knob means people are going to set it wrong. maybe, but most folk will use the default anyway. I'm certainly going to keep it configurable while under development -- the better to benchmark with. > I would also vote "+1" for an ND version of this (growing only a single > dimension). Yes, I think that is a good idea, and would certainly be useful for a common case -- growing a table of data, perhaps when reading a file, etc. > Keeping 2x for each of n dimensions, while conceivable, would > be 2**n extra memory, and hence probably too costly. That, and the fact that you'd have to move a bunch of memory around as it grew -- if you only grow the first dimension (for C order, anyway), you can just tack stuff on the end (which usually necessitates a copy anyway, but it still seems easier. thanks for the feedback, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ondrej at certik.cz Tue Oct 6 02:11:19 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Mon, 5 Oct 2009 23:11:19 -0700 Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults In-Reply-To: <3d375d730910052142s4a66164bn534d7e710d99b80b@mail.gmail.com> References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com> <85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com> <85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com> <85b5c3130910052134ha25ce0ge45600990dbf5b2d@mail.gmail.com> <3d375d730910052142s4a66164bn534d7e710d99b80b@mail.gmail.com> Message-ID: <85b5c3130910052311w2f5423a8i9376522018dce7c8@mail.gmail.com> On Mon, Oct 5, 2009 at 9:42 PM, Robert Kern wrote: > On Mon, Oct 5, 2009 at 23:34, Ondrej Certik wrote: > >> The only mention of the _import_array() in the documentation that I >> found is here: >> >> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NO_IMPORT_ARRAY >> >> >> but I don't understand what it means ---- do I have to just call >> _import_array() and then I can use numpy CAPI, or do I also have to >> define those PY_ARRAY_UNIQUE_SYMBOL etc? > > Not _import_array() but import_array(). > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#importing-the-api > > You don't have multiple files, so you only use import_array() and not > PY_ARRAY_UNIQUE_SYMBOL or NO_IMPORT_ARRAY. > > I'm not really sure what is unclear about the text except that you > searched for the wrong spelling and found the wrong entry. Ah, that's the way. I was using _import_array() and that worked, so I changed that to import_array() and call it just once at the top of the .pyx file and now everything works very nice. Indeed, it is well documented in there, I didn't realize it written one paragraph above it. Thanks for help, all is fine now. Here is how to use that new code in hermes1d from C++: http://groups.google.com/group/hermes1d/msg/54f90f1aa740e93f one can now easily decide if to copy or not to copy the data when constructing the numpy arrays. Ondrej From stefan at sun.ac.za Tue Oct 6 10:20:51 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 6 Oct 2009 16:20:51 +0200 Subject: [Numpy-discussion] NumPy SVN broken Message-ID: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> Hi all, The current SVN HEAD of NumPy is broken and should not be used. Extensions compiled against this version may (will) segfault. Travis, if you could have a look at the side-effects caused by r7050, that would be great. I meant to figure out what was wrong, but seeing that this is a 3000 line patch, I'm not confident I can find the problem easily. Regards St?fan P.S. The new functionality is great, but I don't think we're going to be able to convince David to release without documenting and testing those changes to the C API. From elaine.angelino at gmail.com Tue Oct 6 09:33:38 2009 From: elaine.angelino at gmail.com (Elaine Angelino) Date: Tue, 6 Oct 2009 09:33:38 -0400 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> <901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com> <3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com> <901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com> <3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com> Message-ID: <901520e20910060633u23f7d7a3xa7038f5ed4483bd3@mail.gmail.com> On Mon, Oct 5, 2009 at 7:20 PM, Robert Kern wrote: > On Mon, Oct 5, 2009 at 18:15, Elaine Angelino > wrote: > > > Then I would suggest making tabarrays subclass from ndarray. > Ok, done. We did it using the from*() function design you suggested. In the future, if there are more direct from*() functions working directly on ndarrays we'd want to switch to those of course. While implementing the change, we were reminded of another difference between ndarray and recarray, namely that the constructor of ndarray doesn't accept "names" or "formats" parameters while the recarray constructor does (e.g. you have to specify `dtype` in the ndarray constructor). This feature of the recarray constructor was useful for our purposes, since one of the goals of tabular is providing 'easy' construction methods. We've retained this feature, even though we've switched to subclassing ndarray. There must be a good reason why ndarray does not accept "names" or "formats" parameters and forces the use of the more explicit and unambiguous "dtype". I guess it's "cleaner" in some sense, since the formats parameter is necessarily more limited. It does make sense to have a strongly unambiguous interface for a cornerstone method like np.ndarray.__new__. That said, I think it also makes sense to have more flexible interfaces too, even if they're sometimes more ambiguous (this is part of the purpose of tabular, see http://www.parsemydata.com/tabular/reference/organization.html#design-philosophy ). Thanks for the help, elaine -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Oct 6 09:18:02 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 06 Oct 2009 08:18:02 -0500 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com> <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> <901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com> <3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com> <901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com> <3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com> Message-ID: <4ACB438A.5020007@gmail.com> On 10/05/2009 06:20 PM, Robert Kern wrote: > On Mon, Oct 5, 2009 at 18:15, Elaine Angelino wrote: > > >>> Well, what other recarray functionality are you using? >>> >> None, in our code. We also thought that since at least some people like >> using the attribute reference property, perhaps users of tabarrays might too >> (though we don't personally in our own work) Recarrays still seemed to be >> being supported by NumPy, so it seemed to make sense to use them. but the >> only functional thing in our code are those constructors. >> > Then I would suggest making tabarrays subclass from ndarray. If you > like, provide a tabrecarray that subclasses from both recarray and > tabarray so that people who like attribute access can .view() to their > heart's content. > > >>>> (Also, is first casting to recarrays and then viewing as ndarrays more >>>> expensive than if we went through ndarray directly?) >>>> >>> >> But if NumPy decided to include ndarray versions of the from*() constructors >> in the distribution, would this be achieved by first using the recarray >> constructor and then viewing as ndarray? Or would something more "direct" >> be done? >> > We would fix the functions to not do any unnecessary .view()s. > > Hi Elaine, I do want to look more at what you have done as some of the features are very interesting. This discussion raises the question of what do you find missing in numpy that you have included in tabular package? In particular is there a particular set of functions that you think could be added to numpy or even create a 'better' recarray class? There are real advantages of having at least core components in numpy. Bruce From charlesr.harris at gmail.com Tue Oct 6 12:28:54 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 6 Oct 2009 10:28:54 -0600 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> Message-ID: 2009/10/6 St?fan van der Walt > Hi all, > > The current SVN HEAD of NumPy is broken and should not be used. > Extensions compiled against this version may (will) segfault. > > Travis, if you could have a look at the side-effects caused by r7050, > that would be great. I meant to figure out what was wrong, but seeing > that this is a 3000 line patch, I'm not confident I can find the > problem easily. > > Regards > St?fan > > P.S. The new functionality is great, but I don't think we're going to > be able to convince David to release without documenting and testing > those changes to the C API. > ___ Seeing as the next release process is probably going to start next month and we want things to settle out, it might be advisable delay any intrusive patches to the release after and subject them to review and discussion first. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Oct 6 12:31:52 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 6 Oct 2009 12:31:52 -0400 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> Message-ID: <1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com> On Mon, Oct 5, 2009 at 5:22 PM, Elaine Angelino wrote: > Hi there, > > We are writing to announce the release of "Tabular", a package of Python > modules for working with tabular data. > > Tabular is a package of Python modules for working with tabular data. Its > main object is the tabarray class, a data structure for holding and > manipulating tabular data. By putting data into a tabarray object, you?ll > get a representation of the data that is more flexible and powerful than a > native Python representation. More specifically, tabarray provides: > > -- ultra-fast filtering, selection, and numerical analysis methods, using > convenient Matlab-style matrix operation syntax > -- spreadsheet-style operations, including row & column operations, 'sort', > 'replace', 'aggregate', 'pivot', and 'join' > -- flexible load and save methods for a variety of file formats, including > delimited text (CSV), binary, and HTML > -- helpful inference algorithms for determining formatting parameters and > data types of input files > -- support for hierarchical groupings of columns, both as data structures > and file formats > > You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/) or > alternatively clone our hg repository from bitbucket > (http://bitbucket.org/elaine/tabular/).? We also have posted tutorial-style > Sphinx documentation (http://www.parsemydata.com/tabular/). > > The tabarray object is based on the record array object from the Numerical > Python package (NumPy), and Tabular is built to interface well with NumPy in > general.? Our intended audience is two-fold: (1) Python users who, though > they may not be familiar with NumPy, are in need of a way to work with > tabular data, and (2) NumPy users who would like to do spreadsheet-style > operations on top of their more "numerical" work. > > We hope that some of you find Tabular useful! > > Best, > > Elaine and Dan I briefly looked at the sphinx docs and the code. Tabular looks pretty useful and the code can be partially read as recipes for working with recarrays or structured arrays. Thanks for the choice of license (it makes looking at the code "legal"). I didn't see any explicit nan handling. Are missing values allowed e.g. in the constructor? I looked a bit closer at function like tabular.fast.recarrayisin since I always have problems with these row operations. Are these function supposed to work with arbitrary structured arrays? The tests are only for a 1d integer arrays. With floats the default string representation doesn't sort correctly. Or am I misreading the function? >>> arr = np.array([6,1,2,1e-13,0.5*1e-14,1,2e25,3,0,7]).view([('',float)]*2) >>> arr array([(6.0, 1.0), (2.0, 1e-013), (5e-015, 1.0), (2.0000000000000002e+025, 3.0), (0.0, 7.0)], dtype=[('f0', '>> np.sort([str(l) for l in arr]) array(['(0.0, 7.0)', '(2.0, 1e-013)', '(2.0000000000000002e+025, 3.0)', '(5e-015, 1.0)', '(6.0, 1.0)'], dtype='|S30') Being able to do a searchsorted on rows of an array would be a useful feature in numpy. Is there a sortable 1d representation of the rows of a 2d float or mixed type array? Thanks, Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Tue Oct 6 12:36:37 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 6 Oct 2009 10:36:37 -0600 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> Message-ID: 2009/10/6 St?fan van der Walt > Hi all, > > The current SVN HEAD of NumPy is broken and should not be used. > Extensions compiled against this version may (will) segfault. > > Can you be more specific? I haven't had any problems running current svn with scipy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Oct 6 12:46:20 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 6 Oct 2009 18:46:20 +0200 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> Message-ID: <9457e7c80910060946y39cf3186r17839125d6b1d20a@mail.gmail.com> 2009/10/6 Charles R Harris : > 2009/10/6 St?fan van der Walt >> >> Hi all, >> >> The current SVN HEAD of NumPy is broken and should not be used. >> Extensions compiled against this version may (will) segfault. >> > > Can you be more specific? I haven't had any problems running current svn > with scipy. Both David and I had segfaults when running scipy compiled off the latest numpy. An example from Kiva: Program received signal SIGSEGV, Segmentation fault. PyArray_INCREF (mp=0x42) at build/scons/numpy/core/src/multiarray/refcount.c:103 103 if (!PyDataType_REFCHK(mp->descr)) { (gdb) bt #0 PyArray_INCREF (mp=0x42) at build/scons/numpy/core/src/multiarray/refcount.c:103 #1 0x00985f67 in agg::pixel_map_as_unowned_array (pix_map=...) at build/src.linux-i686-2.6/enthought/kiva/agg/src/x11/plat_support_wrap.cpp:2909 #2 0x0098795f in _wrap_pixel_map_as_unowned_array (args=0xb7ed032c) at build/src.linux-i686-2.6/enthought/kiva/agg/src/x11/plat_support_wrap.cpp:3341 Via bisection, the source of the problem has been localised to the merge of the datetime branch. Cheers St?fan From cournape at gmail.com Tue Oct 6 12:50:34 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 7 Oct 2009 01:50:34 +0900 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> Message-ID: <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> On Wed, Oct 7, 2009 at 1:36 AM, Charles R Harris wrote: > > > 2009/10/6 St?fan van der Walt >> >> Hi all, >> >> The current SVN HEAD of NumPy is broken and should not be used. >> Extensions compiled against this version may (will) segfault. >> > > Can you be more specific? I haven't had any problems running current svn > with scipy. The version itself is fine, but the ABI has been changed in an incompatible way: if you have an extension built against say numpy 1.2.1, and then use a numpy built from sources after the datetime merge, it will segfault right away. It does so for scipy and several custom extensions. The abi breakage was found to be the datetime merge. David From josef.pktd at gmail.com Tue Oct 6 13:01:18 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 6 Oct 2009 13:01:18 -0400 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com> Message-ID: <1cd32cbb0910061001k52ca9024r6bf0f47a4909963b@mail.gmail.com> On Tue, Oct 6, 2009 at 12:31 PM, wrote: > On Mon, Oct 5, 2009 at 5:22 PM, Elaine Angelino > wrote: >> Hi there, >> >> We are writing to announce the release of "Tabular", a package of Python >> modules for working with tabular data. >> >> Tabular is a package of Python modules for working with tabular data. Its >> main object is the tabarray class, a data structure for holding and >> manipulating tabular data. By putting data into a tabarray object, you?ll >> get a representation of the data that is more flexible and powerful than a >> native Python representation. More specifically, tabarray provides: >> >> -- ultra-fast filtering, selection, and numerical analysis methods, using >> convenient Matlab-style matrix operation syntax >> -- spreadsheet-style operations, including row & column operations, 'sort', >> 'replace', 'aggregate', 'pivot', and 'join' >> -- flexible load and save methods for a variety of file formats, including >> delimited text (CSV), binary, and HTML >> -- helpful inference algorithms for determining formatting parameters and >> data types of input files >> -- support for hierarchical groupings of columns, both as data structures >> and file formats >> >> You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/) or >> alternatively clone our hg repository from bitbucket >> (http://bitbucket.org/elaine/tabular/).? We also have posted tutorial-style >> Sphinx documentation (http://www.parsemydata.com/tabular/). >> >> The tabarray object is based on the record array object from the Numerical >> Python package (NumPy), and Tabular is built to interface well with NumPy in >> general.? Our intended audience is two-fold: (1) Python users who, though >> they may not be familiar with NumPy, are in need of a way to work with >> tabular data, and (2) NumPy users who would like to do spreadsheet-style >> operations on top of their more "numerical" work. >> >> We hope that some of you find Tabular useful! >> >> Best, >> >> Elaine and Dan > > I briefly looked at the sphinx docs and the code. Tabular looks pretty > useful and > the code can be partially read as recipes for working with recarrays > or structured > arrays. Thanks for the choice of license (it makes looking at the code "legal"). > > I didn't see any explicit nan handling. Are missing values allowed > e.g. in the constructor? > > I looked a bit closer at function like tabular.fast.recarrayisin since > I always have problems > with these row operations. > Are these function supposed to work with arbitrary structured arrays? > The tests are only > for a 1d integer arrays. > With floats the default string representation doesn't sort correctly. > Or am I misreading the function? > >>>> arr = np.array([6,1,2,1e-13,0.5*1e-14,1,2e25,3,0,7]).view([('',float)]*2) >>>> arr > array([(6.0, 1.0), (2.0, 1e-013), (5e-015, 1.0), > ? ? ? (2.0000000000000002e+025, 3.0), (0.0, 7.0)], > ? ? ?dtype=[('f0', '>>> np.sort([str(l) for l in arr]) > array(['(0.0, 7.0)', '(2.0, 1e-013)', '(2.0000000000000002e+025, 3.0)', > ? ? ? '(5e-015, 1.0)', '(6.0, 1.0)'], > ? ? ?dtype='|S30') Maybe this doesn't matter for the purpose of this function. I will download and try the code before I make any more irrelevant comments. Josef > > Being able to do a searchsorted on rows of an array would be a useful feature > in numpy. Is there a sortable 1d representation of the rows of a 2d float or > mixed type array? > > Thanks, > > Josef > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > From charlesr.harris at gmail.com Tue Oct 6 13:04:02 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 6 Oct 2009 11:04:02 -0600 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> Message-ID: On Tue, Oct 6, 2009 at 10:50 AM, David Cournapeau wrote: > On Wed, Oct 7, 2009 at 1:36 AM, Charles R Harris > wrote: > > > > > > 2009/10/6 St?fan van der Walt > >> > >> Hi all, > >> > >> The current SVN HEAD of NumPy is broken and should not be used. > >> Extensions compiled against this version may (will) segfault. > >> > > > > Can you be more specific? I haven't had any problems running current svn > > with scipy. > > The version itself is fine, but the ABI has been changed in an > incompatible way: if you have an extension built against say numpy > 1.2.1, and then use a numpy built from sources after the datetime > merge, it will segfault right away. It does so for scipy and several > custom extensions. The abi breakage was found to be the datetime > merge. > > Ah... That's a fine kettle of fish. Any idea what ABI calls are causing the problem? Maybe the dtype change wasn't made in a compatible way. IIRC, something was added to the dtype? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Oct 6 13:14:47 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 7 Oct 2009 02:14:47 +0900 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> Message-ID: <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> On Wed, Oct 7, 2009 at 2:04 AM, Charles R Harris wrote: > > > On Tue, Oct 6, 2009 at 10:50 AM, David Cournapeau > wrote: >> >> On Wed, Oct 7, 2009 at 1:36 AM, Charles R Harris >> wrote: >> > >> > >> > 2009/10/6 St?fan van der Walt >> >> >> >> Hi all, >> >> >> >> The current SVN HEAD of NumPy is broken and should not be used. >> >> Extensions compiled against this version may (will) segfault. >> >> >> > >> > Can you be more specific? I haven't had any problems running current svn >> > with scipy. >> >> The version itself is fine, but the ABI has been changed in an >> incompatible way: if you have an extension built against say numpy >> 1.2.1, and then use a numpy built from sources after the datetime >> merge, it will segfault right away. It does so for scipy and several >> custom extensions. The abi breakage was found to be the datetime >> merge. >> > > Ah... That's a fine kettle of fish. Any idea what ABI calls are causing the > problem? Maybe the dtype change wasn't made in a compatible way. IIRC, > something was added to the dtype? Yes, but that should not cause trouble. Adding members to structure should be fine. I quickly look at the diff, and some changes in the code generators look suspicious, e.g.: types = ['Generic','Number','Integer','SignedInteger','UnsignedInteger', - 'Inexact', + 'Inexact', 'TimeInteger', 'Floating', 'ComplexFloating', 'Flexible', 'Character', 'Byte','Short','Int', 'Long', 'LongLong', 'UByte', 'UShort', 'UInt', 'ULong', 'ULongLong', 'Float', 'Double', 'LongDouble', 'CFloat', 'CDouble', 'CLongDouble', 'Object', 'String', 'Unicode', - 'Void'] + 'Void', 'Datetime', 'Timedelta'] As the list is used to initialize some values from the API function pointer array, inserts should be avoided. You can see the consequence on the generated files, e.g. part of __multiarray_api.h diff between datetimemerge and just before: < #define PyFloatingArrType_Type (*(PyTypeObject *)PyArray_API[16]) < #define PyComplexFloatingArrType_Type (*(PyTypeObject *)PyArray_API[17]) < #define PyFlexibleArrType_Type (*(PyTypeObject *)PyArray_API[18]) < #define PyCharacterArrType_Type (*(PyTypeObject *)PyArray_API[19]) < #define PyByteArrType_Type (*(PyTypeObject *)PyArray_API[20]) < #define PyShortArrType_Type (*(PyTypeObject *)PyArray_API[21]) < #define PyIntArrType_Type (*(PyTypeObject *)PyArray_API[22]) < #define PyLongArrType_Type (*(PyTypeObject *)PyArray_API[23]) < #define PyLongLongArrType_Type (*(PyTypeObject *)PyArray_API[24]) < #define PyUByteArrType_Type (*(PyTypeObject *)PyArray_API[25]) < #define PyUShortArrType_Type (*(PyTypeObject *)PyArray_API[26]) < #define PyUIntArrType_Type (*(PyTypeObject *)PyArray_API[27]) < #define PyULongArrType_Type (*(PyTypeObject *)PyArray_API[28]) < #define PyULongLongArrType_Type (*(PyTypeObject *)PyArray_API[29]) < #define PyFloatArrType_Type (*(PyTypeObject *)PyArray_API[30]) < #define PyDoubleArrType_Type (*(PyTypeObject *)PyArray_API[31]) < #define PyLongDoubleArrType_Type (*(PyTypeObject *)PyArray_API[32]) < #define PyCFloatArrType_Type (*(PyTypeObject *)PyArray_API[33]) < #define PyCDoubleArrType_Type (*(PyTypeObject *)PyArray_API[34]) < #define PyCLongDoubleArrType_Type (*(PyTypeObject *)PyArray_API[35]) < #define PyObjectArrType_Type (*(PyTypeObject *)PyArray_API[36]) < #define PyStringArrType_Type (*(PyTypeObject *)PyArray_API[37]) < #define PyUnicodeArrType_Type (*(PyTypeObject *)PyArray_API[38]) < #define PyVoidArrType_Type (*(PyTypeObject *)PyArray_API[39]) --- > #define PyTimeIntegerArrType_Type (*(PyTypeObject *)PyArray_API[16]) > #define PyFloatingArrType_Type (*(PyTypeObject *)PyArray_API[17]) > #define PyComplexFloatingArrType_Type (*(PyTypeObject *)PyArray_API[18]) > #define PyFlexibleArrType_Type (*(PyTypeObject *)PyArray_API[19]) > #define PyCharacterArrType_Type (*(PyTypeObject *)PyArray_API[20]) > #define PyByteArrType_Type (*(PyTypeObject *)PyArray_API[21]) > #define PyShortArrType_Type (*(PyTypeObject *)PyArray_API[22]) > #define PyIntArrType_Type (*(PyTypeObject *)PyArray_API[23]) > #define PyLongArrType_Type (*(PyTypeObject *)PyArray_API[24]) > #define PyLongLongArrType_Type (*(PyTypeObject *)PyArray_API[25]) > #define PyUByteArrType_Type (*(PyTypeObject *)PyArray_API[26]) > #define PyUShortArrType_Type (*(PyTypeObject *)PyArray_API[27]) > #define PyUIntArrType_Type (*(PyTypeObject *)PyArray_API[28]) > #define PyULongArrType_Type (*(PyTypeObject *)PyArray_API[29]) > #define PyULongLongArrType_Type (*(PyTypeObject *)PyArray_API[30]) > #define PyFloatArrType_Type (*(PyTypeObject *)PyArray_API[31]) > #define PyDoubleArrType_Type (*(PyTypeObject *)PyArray_API[32]) > #define PyLongDoubleArrType_Type (*(PyTypeObject *)PyArray_API[33]) > #define PyCFloatArrType_Type (*(PyTypeObject *)PyArray_API[34]) > #define PyCDoubleArrType_Type (*(PyTypeObject *)PyArray_API[35]) > #define PyCLongDoubleArrType_Type (*(PyTypeObject *)PyArray_API[36]) > #define PyObjectArrType_Type (*(PyTypeObject *)PyArray_API[37]) > #define PyStringArrType_Type (*(PyTypeObject *)PyArray_API[38]) > #define PyUnicodeArrType_Type (*(PyTypeObject *)PyArray_API[39]) > #define PyVoidArrType_Type (*(PyTypeObject *)PyArray_API[40]) > #define PyDatetimeArrType_Type (*(PyTypeObject *)PyArray_API[41]) > #define PyTimedeltaArrType_Type (*(PyTypeObject *)PyArray_API[42]) David From dwf at cs.toronto.edu Tue Oct 6 13:19:37 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 6 Oct 2009 13:19:37 -0400 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> Message-ID: On 6-Oct-09, at 12:50 PM, David Cournapeau wrote: > The version itself is fine, but the ABI has been changed in an > incompatible way: if you have an extension built against say numpy > 1.2.1, and then use a numpy built from sources after the datetime > merge, it will segfault right away. It does so for scipy and several > custom extensions. The abi breakage was found to be the datetime > merge. I experienced something similar recently with both ETS and pytables. Good to know finally what was going on. :) David From charlesr.harris at gmail.com Tue Oct 6 13:31:22 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 6 Oct 2009 11:31:22 -0600 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> Message-ID: On Tue, Oct 6, 2009 at 11:14 AM, David Cournapeau wrote: > On Wed, Oct 7, 2009 at 2:04 AM, Charles R Harris > wrote: > > > > > > On Tue, Oct 6, 2009 at 10:50 AM, David Cournapeau > > wrote: > >> > >> On Wed, Oct 7, 2009 at 1:36 AM, Charles R Harris > >> wrote: > >> > > >> > > >> > 2009/10/6 St?fan van der Walt > >> >> > >> >> Hi all, > >> >> > >> >> The current SVN HEAD of NumPy is broken and should not be used. > >> >> Extensions compiled against this version may (will) segfault. > >> >> > >> > > >> > Can you be more specific? I haven't had any problems running current > svn > >> > with scipy. > >> > >> The version itself is fine, but the ABI has been changed in an > >> incompatible way: if you have an extension built against say numpy > >> 1.2.1, and then use a numpy built from sources after the datetime > >> merge, it will segfault right away. It does so for scipy and several > >> custom extensions. The abi breakage was found to be the datetime > >> merge. > >> > > > > Ah... That's a fine kettle of fish. Any idea what ABI calls are causing > the > > problem? Maybe the dtype change wasn't made in a compatible way. IIRC, > > something was added to the dtype? > > Yes, but that should not cause trouble. Adding members to structure > should be fine. > > I quickly look at the diff, and some changes in the code generators > look suspicious, e.g.: > > types = ['Generic','Number','Integer','SignedInteger','UnsignedInteger', > - 'Inexact', > + 'Inexact', 'TimeInteger', > 'Floating', 'ComplexFloating', 'Flexible', 'Character', > 'Byte','Short','Int', 'Long', 'LongLong', 'UByte', 'UShort', > 'UInt', 'ULong', 'ULongLong', 'Float', 'Double', 'LongDouble', > 'CFloat', 'CDouble', 'CLongDouble', 'Object', 'String', 'Unicode', > - 'Void'] > + 'Void', 'Datetime', 'Timedelta'] > > As the list is used to initialize some values from the API function > pointer array, inserts should be avoided. You can see the consequence > on the generated files, e.g. part of __multiarray_api.h diff between > datetimemerge and just before: > > Looks like a clue ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Oct 6 13:49:22 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 6 Oct 2009 13:49:22 -0400 Subject: [Numpy-discussion] tostring() for array rows Message-ID: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com> If I have a structured or a regular array, is the use of strides in the following always correct for the length of the row memory? I would like to do tostring() but on each row, by creating a string view of the memory in a 1d array. Thanks, Josef >>> tmp = np.random.randn(4,3) >>> tmp.ravel().view('S'+str(tmp.strides[0])) array(['j\x94gv\xa5\x80\xe6?=\xea\xa3\xcb\xb9W\x05 at 4.\xa2J3\xe2\xee?', '\xe3\x89\x973My\xf7\xbf\xc1\x17\x0f\xff\xe9\x19\xb8\xbf\xdb?\x00\xc9c\xf0\xf9?', '\x1f\xc3,B\x9dQ\xa1?F\x1e\x12\x0f\x02\xfc\xd4\xbfz\xe0\xa5_G.\xd0?', '$#T\x0e\xad\x85\xfb\xbf\xf3S\xa6`\x89\x87\xdc?7]\xd9lt\xb4\xf4?'], dtype='|S24') >>> tmp.tostring() 'j\x94gv\xa5\x80\xe6?=\xea\xa3\xcb\xb9W\x05 at 4.\xa2J3\xe2\xee?\xe3\x89\x973My\xf7\xbf\xc1\x17\x0f\xff\xe9\x19\xb8\xbf\xdb?\x00\xc9c\xf0\xf9?\x1f\xc3,B\x9dQ\xa1?F\x1e\x12\x0f\x02\xfc\xd4\xbfz\xe0\xa5_G.\xd0?$#T\x0e\xad\x85\xfb\xbf\xf3S\xa6`\x89\x87\xdc?7]\xd9lt\xb4\xf4?' >>> tmp array([(4.0, 0, 1), (1.0, 1, 3), (2.0, 2, 4), (4.0, 0, 1)], dtype=[('f0', '>> tmp.view('S'+str(tmp.strides[0])) array(['\x00\x00\x00\x00\x00\x00\x10@\x00\x00\x00\x00\x01', '\x00\x00\x00\x00\x00\x00\xf0?\x01\x00\x00\x00\x03', '\x00\x00\x00\x00\x00\x00\x00@\x02\x00\x00\x00\x04', '\x00\x00\x00\x00\x00\x00\x10@\x00\x00\x00\x00\x01'], dtype='|S16') From dyamins at gmail.com Tue Oct 6 14:09:30 2009 From: dyamins at gmail.com (Dan Yamins) Date: Tue, 6 Oct 2009 14:09:30 -0400 Subject: [Numpy-discussion] Tabular data package In-Reply-To: <1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com> References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> <1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com> Message-ID: <15e4667e0910061109h7ecdc5a5wef94778de3f5cd48@mail.gmail.com> > > I didn't see any explicit nan handling. Are missing values allowed > e.g. in the constructor? > No, this is a valid point. We don't handle this as explicitly as we should. Are you mostly talking about nan handling in loading from delimited text files? (Or are you talking about something more general, like integration of masked arrays?) In loading from delimited text files, you can use the "linefixer" and "valuefixer" arguments, which are for more general purposes, and which will get the job done, but slowly. We should do something more specialized for missing values that would be faster. > Are these function supposed to work with arbitrary structured arrays? > Well, they're only really tested for working with strings, floats, and ints (tho only the int tests are included in the test module, we should expand that). I imagine it's possible they'd work with more sophisticated things but I'm not sure. > > >>> arr = > np.array([6,1,2,1e-13,0.5*1e-14,1,2e25,3,0,7]).view([('',float)]*2) > >>> arr > array([(6.0, 1.0), (2.0, 1e-013), (5e-015, 1.0), > (2.0000000000000002e+025, 3.0), (0.0, 7.0)], > dtype=[('f0', ' >>> np.sort([str(l) for l in arr]) > array(['(0.0, 7.0)', '(2.0, 1e-013)', '(2.0000000000000002e+025, 3.0)', > '(5e-015, 1.0)', '(6.0, 1.0)'], > dtype='|S30') > > Well on this example (as in tests that we did), fast.recarrayisin performed as spec'd. ... But definitely write back again if you think it's failing somewhere. In general, extending a number of the thigns in Tabular (e.g. the loadSV and saveSV) to arbitrary structured dtypes as opposed to more basic types would be great. Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Oct 6 14:42:59 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 06 Oct 2009 13:42:59 -0500 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: References: Message-ID: <4ACB8FB3.5040706@gmail.com> On 10/05/2009 02:13 PM, Pierre GM wrote: > All, > Could you try r7449 ? I introduced some mechanisms to keep track of > invalid lines (where the number of columns don't match what's > expected). By default, a warning is emitted and these lines are > skipped, but an optional argument gives the possibility to raise an > exception instead. > Now, I need more tests about wrong converters. I'm trying to optimize > the upgrade mechanism (there are too many intertwined loops for my > taste now), I'll keep you posted. > Meanwhile, if you could come with more cases of failure, please send > them my way. > Cheers > P. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Hi, Excellent as the changes appear to address incorrect number of delimiters. I think that the default invalid_raise should be True. One 'feature' is that there is no way to indicate multiple delimiters when the delimiter is whitespace. A B C D 1 2 3 4 1 4 5 Which I consider a user beware issue when using whitespace as the delimiter especially in Python. Bruce From pgmdevlist at gmail.com Tue Oct 6 15:33:53 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 6 Oct 2009 15:33:53 -0400 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: <4ACB8FB3.5040706@gmail.com> References: <4ACB8FB3.5040706@gmail.com> Message-ID: <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> On Oct 6, 2009, at 2:42 PM, Bruce Southey wrote: >> > Hi, > Excellent as the changes appear to address incorrect number of > delimiters. They should also give some extra info if there's a problem w/ the converters. > I think that the default invalid_raise should be True. Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ? > > One 'feature' is that there is no way to indicate multiple delimiters > when the delimiter is whitespace. > A B C D > 1 2 3 4 > 1 4 5 Have you tried using a sequence of integers for the delimiter ? Would you mind sending me some test ? From Chris.Barker at noaa.gov Tue Oct 6 16:39:34 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 06 Oct 2009 13:39:34 -0700 Subject: [Numpy-discussion] tostring() for array rows In-Reply-To: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com> References: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com> Message-ID: <4ACBAB06.4000008@noaa.gov> josef.pktd at gmail.com wrote: > If I have a structured or a regular array, is the use of strides in > the following always correct for the length of the row memory? > > I would like to do tostring() but on each row, by creating a string > view of the memory in a 1d array. Maybe I'm missing what you want, but why not just: In [15]: tmp Out[15]: array([[ 1.07810097, -1.74157351, 0.29740878], [-0.16786436, 0.45752272, -0.8038045 ], [-0.17195028, -1.16753882, 0.04329128], [ 0.45460137, -0.44584955, -0.77140505]]) In [16]: rows = [] In [17]: for r in range(tmp.shape[0]): rows.append(tmp[r,:].tostring()) ....: In [19]: rows Out[19]: ['?\xf1?\xe6\xce\x1f9\xce\xbf\xfb\xdd|.\xc85Z?\xd3\x08\xbe\xd6\xb7\xb6\xe8', '\xbf\xc5|\x94Sx\x92\x18?\xddH\r\\T\xfbT\xbf\xe9\xb8\xc45\xff\x92\xdf', '\xbf\xc6\x02w\x82\x18i\xaf\xbf\xf2\xae=/\xfe\xff\x0b?\xa6*FD\xae\xd1F', '?\xdd\x180Z\xcet\xa5\xbf\xdc\x88\xcc\x8a\x8c\x8b\xe7\xbf\xe8\xafY\xa2\xf8\xac '] in general, you can let numpy worry about the strides, etc. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From gokhansever at gmail.com Tue Oct 6 16:42:51 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 6 Oct 2009 15:42:51 -0500 Subject: [Numpy-discussion] Questions about masked arrays Message-ID: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> Hello, I have a sample masked array data as shown below. 1-) When I list the whole array I see the fill value correctly. However below that line, when I do access the 5th element, fill_value flies upto 1e+20. What might be wrong here? I[5]: c.data['Air_Temp'] O[5]: masked_array(data = [13.1509 13.1309 13.1278 13.1542 -- 13.1539 13.1387 -- -- -- 13.1107 13.1351 13.2073 13.2562 13.3533 13.3889 13.4067 13.2938 13.1962 13.1248 13.0411 12.9534 12.8354 12.7392 12.6725], mask = [False False False False True False False True True True False False False False False False False False False False False False False False False], fill_value = 999999.9999) I[6]: c.data['Air_Temp'][4] O[6]: masked_array(data = --, mask = True, fill_value = 1e+20) 2-) What is wrong with the arccos calculation? Should not that result the same as with cos(d) result? I[9]: d = c.data['Air_Temp'][4] I[11]: cos(d) O[11]: masked_array(data = --, mask = True, fill_value = 1e+20) I[12]: arccos(d) O[12]: masked_array(data = 1.57079632679, mask = False, fill_value = 1e+20) Any ideas? -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Tue Oct 6 16:43:58 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 06 Oct 2009 13:43:58 -0700 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> Message-ID: <4ACBAC0E.3070708@noaa.gov> Pierre GM wrote: >> I think that the default invalid_raise should be True. > > Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ? yup -- make it +2 -- ignoring erreos and losing data by default is a "bad idea"! >> One 'feature' is that there is no way to indicate multiple delimiters >> when the delimiter is whitespace. >> A B C D >> 1 2 3 4 >> 1 4 5 I'd say someone has made a very poor choice of file formats! Unless this s a fixed width file, in which case it should be processes as such, rather than as a delimited one. I suppose it wouldn't hurt to add that feature to genfromtxt.. or is it there already. Perhaps that's what this means: > Have you tried using a sequence of integers for the delimiter ? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From george.trojan at noaa.gov Tue Oct 6 16:42:22 2009 From: george.trojan at noaa.gov (George Trojan) Date: Tue, 6 Oct 2009 20:42:22 +0000 (UTC) Subject: [Numpy-discussion] vectorize() broken on Python2.6 Message-ID: f2py generated wrappers cannot be vectorized with numpy1.3.0 and Python2.6.2. The reason is change to Python's getargs.c. Vectorize, or rather _get_nargs() defined in lib/function_base.py tries to determine the number of arguments from error message generated while the interpreter parses function invocation without any arguments. The messages (in getargs.c) have changed, for example: Required argument 'a' (pos 1) not found Since the message no longer contains information how many arguments a function takes, a fix is not obvious. Is there a solution coming soon? I posted a message on comp.lang.python few days ago. Sturda Molden generated bug report http://projects.scipy.org/numpy/ticket/1247. However the change he suggests does not fix the problem. I am tempted to apply a temporary workaround for my current needs: The __init__ method in vectorize would accept an additional argument, interface=None. That argument would be a Python code stub, prepared manually (though it could easily be generated by f2py). This stub would be used by _get_nargs() when the original object does not contain attribute 'func_code'. Example: Fortran code integer function f3(a, b, c) integer, intent(in) :: a, b integer, optional, intent(in) :: c if (present(c)) then f3 = a - b else f3 = a + b endif end function f3 Interface def f3_iface(a, b, c=None): pass Call vf3 = numpy.vectorize(ftest.f3, f3_iface) Are there any drawbacks to this approach? George From josef.pktd at gmail.com Tue Oct 6 16:47:50 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 6 Oct 2009 16:47:50 -0400 Subject: [Numpy-discussion] tostring() for array rows In-Reply-To: <4ACBAB06.4000008@noaa.gov> References: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com> <4ACBAB06.4000008@noaa.gov> Message-ID: <1cd32cbb0910061347r5c12a09di2bd93a4310685045@mail.gmail.com> On Tue, Oct 6, 2009 at 4:39 PM, Christopher Barker wrote: > josef.pktd at gmail.com wrote: >> If I have a structured or a regular array, is the use of strides in >> the following always correct for the length of the row memory? >> >> I would like to do tostring() but on each row, by creating a string >> view of the memory in a 1d array. > > Maybe I'm missing what you want, but why not just: > > In [15]: tmp > Out[15]: > array([[ 1.07810097, -1.74157351, ?0.29740878], > ? ? ? ?[-0.16786436, ?0.45752272, -0.8038045 ], > ? ? ? ?[-0.17195028, -1.16753882, ?0.04329128], > ? ? ? ?[ 0.45460137, -0.44584955, -0.77140505]]) > > In [16]: rows = [] > > In [17]: for r in range(tmp.shape[0]): > ? ? ? ? ? ? ?rows.append(tmp[r,:].tostring()) > ? ?....: > > In [19]: rows > Out[19]: > ['?\xf1?\xe6\xce\x1f9\xce\xbf\xfb\xdd|.\xc85Z?\xd3\x08\xbe\xd6\xb7\xb6\xe8', > ?'\xbf\xc5|\x94Sx\x92\x18?\xddH\r\\T\xfbT\xbf\xe9\xb8\xc45\xff\x92\xdf', > ?'\xbf\xc6\x02w\x82\x18i\xaf\xbf\xf2\xae=/\xfe\xff\x0b?\xa6*FD\xae\xd1F', > > '?\xdd\x180Z\xcet\xa5\xbf\xdc\x88\xcc\x8a\x8c\x8b\xe7\xbf\xe8\xafY\xa2\xf8\xac > '] > > > in general, you can let numpy worry about the strides, etc. I wanted to avoid the python loop and thought creating the view will be faster with large arrays. But for this I need to know the memory length of a row of arbitrary types for the conversion to strings, strides was the only thing I could think of. > > -Chris > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pgmdevlist at gmail.com Tue Oct 6 17:04:28 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 6 Oct 2009 17:04:28 -0400 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: <4ACBAC0E.3070708@noaa.gov> References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> <4ACBAC0E.3070708@noaa.gov> Message-ID: On Oct 6, 2009, at 4:43 PM, Christopher Barker wrote: > Pierre GM wrote: >>> I think that the default invalid_raise should be True. >> >> Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ? > > yup -- make it +2 -- ignoring erreos and losing data by default is a > "bad idea"! OK then, that's enough for me: I'll put invalid_raise as True by default. Note that a warning was emitted no matter what. > >>> One 'feature' is that there is no way to indicate multiple >>> delimiters >>> when the delimiter is whitespace. >>> A B C D >>> 1 2 3 4 >>> 1 4 5 > > I'd say someone has made a very poor choice of file formats! > > Unless this s a fixed width file, in which case it should be processes > as such, rather than as a delimited one. I suppose it wouldn't hurt to > add that feature to genfromtxt.. or is it there already. Perhaps > that's > what this means: > >> Have you tried using a sequence of integers for the delimiter ? Yes, if you give a sequence of integers as delimiter, it is interpreted as the length of each field. At least, should be. From pgmdevlist at gmail.com Tue Oct 6 17:28:16 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 6 Oct 2009 17:28:16 -0400 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> Message-ID: <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> On Oct 6, 2009, at 4:42 PM, G?khan Sever wrote: > Hello, > > I have a sample masked array data as shown below. > > 1-) When I list the whole array I see the fill value correctly. > However below that line, when I do access the 5th element, > fill_value flies upto 1e+20. What might be wrong here? Nothing. Your 5th element is the special constant numpy.ma.masked, which has its own filling_value by default. I'll check whether it's worth inheriting the fill_value from the original array. If you could give me a test case where you'd need that value to keep the original filling_value, that'd help me make up my mind. > 2-) What is wrong with the arccos calculation? Should not that > result the same as with cos(d) result? Mmh, what numpy are you using ? When I try with a recent one, np.arccos does output ma.masked... From gokhansever at gmail.com Tue Oct 6 18:57:05 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 6 Oct 2009 17:57:05 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> Message-ID: <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> On Tue, Oct 6, 2009 at 4:28 PM, Pierre GM wrote: > > On Oct 6, 2009, at 4:42 PM, G?khan Sever wrote: > > > Hello, > > > > I have a sample masked array data as shown below. > > > > 1-) When I list the whole array I see the fill value correctly. > > However below that line, when I do access the 5th element, > > fill_value flies upto 1e+20. What might be wrong here? > > Nothing. Your 5th element is the special constant numpy.ma.masked, > which has its own filling_value by default. I'll check whether it's > worth inheriting the fill_value from the original array. If you could > give me a test case where you'd need that value to keep the original > filling_value, that'd help me make up my mind. > Seeing a different filling value is causing confusion. Both for myself, and when I try to demonstrate the usage of masked array to other people. Also say, if I want to replace that one element back to its original state will it use fill_value as 1e+20 or 999999.9999? > > > 2-) What is wrong with the arccos calculation? Should not that > > result the same as with cos(d) result? > I first tested on 1.3.0, and later on my laptop using 1.4dev version which is about an old month built. Once again the results for each arc... function I[31]: d O[31]: masked_array(data = --, mask = True, fill_value = 1e+20) I[26]: arccos(d) O[26]: masked_array(data = 1.57079632679, mask = False, fill_value = 1e+20) I[28]: arccosh(d) O[28]: masked_array(data = nan, mask = False, fill_value = 1e+20) I[30]: arcsin(d) O[30]: masked_array(data = 0.0, mask = False, fill_value = 1e+20) I[32]: arcsinh(d) O[32]: masked_array(data = --, mask = True, fill_value = 1e+20) I[33]: arctan(d) O[33]: masked_array(data = --, mask = True, fill_value = 1e+20) I[35]: arctanh(d) O[35]: masked_array(data = 0.0, mask = False, fill_value = 1e+20) Only arcsinh and arctan results correctly. > > Mmh, what numpy are you using ? When I try with a recent one, > np.arccos does output ma.masked... > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Tue Oct 6 20:38:23 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 6 Oct 2009 20:38:23 -0400 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> Message-ID: <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> On Oct 6, 2009, at 6:57 PM, G?khan Sever wrote: > Seeing a different filling value is causing confusion. Both for > myself, and when I try to demonstrate the usage of masked array to > other people. Fair enough. I must admit that `fill_value` is a vestige from the previous implementation (talking pre 1.2 here), that is no longer really needed (cf below for more details). > Also say, if I want to replace that one element back to its original > state will it use fill_value as 1e+20 or 999999.9999? What do you mean by 'replace back to its original state' ? Using `filled`, you mean ? > > 2-) What is wrong with the arccos calculation? Should not that > > result the same as with cos(d) result? > > I first tested on 1.3.0, and later on my laptop using 1.4dev version > which is about an old month built. > > Once again the results for each arc... function Er, I assume it's np.arccos ? Anyway, I'm puzzled. Works like a charm here (r7438 for numpy.ma). Could it be that something went wrng with some ufuncs ? I didn't touch ma since 09/08 (thanks, svn history), so I don't think it comes from here... Would you mind trying a more recent svn version ? From gokhansever at gmail.com Tue Oct 6 21:54:16 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 6 Oct 2009 20:54:16 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> Message-ID: <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> On Tue, Oct 6, 2009 at 7:38 PM, Pierre GM wrote: > > On Oct 6, 2009, at 6:57 PM, G?khan Sever wrote: > > Seeing a different filling value is causing confusion. Both for > > myself, and when I try to demonstrate the usage of masked array to > > other people. > > Fair enough. I must admit that `fill_value` is a vestige from the > previous implementation (talking pre 1.2 here), that is no longer > really needed (cf below for more details). > > > Also say, if I want to replace that one element back to its original > > state will it use fill_value as 1e+20 or 999999.9999? > > What do you mean by 'replace back to its original state' ? Using > `filled`, you mean ? > Yes, in more properly stated fashion "filled" :) I[14]: c.data['Air_Temp'][4] O[14]: masked_array(data = --, mask = True, fill_value = 1e+20) I[15]: c.data['Air_Temp'][4].filled() O[15]: array(1e+20) Little buggy, isn't it? It properly fill the whole array: I[13]: c.data['Air_Temp'].filled() O[13]: array([ 1.31509000e+01, 1.31309000e+01, 1.31278000e+01, 1.31542000e+01, 1.00000000e+06, 1.31539000e+01, 1.31387000e+01, 1.00000000e+06, 1.00000000e+06, 1.00000000e+06, 1.31107000e+01, 1.31351000e+01, 1.32073000e+01, 1.32562000e+01, 1.33533000e+01, 1.33889000e+01, 1.34067000e+01, 1.32938000e+01, 1.31962000e+01, 1.31248000e+01, 1.30411000e+01, 1.29534000e+01, 1.28354000e+01, 1.27392000e+01, 1.26725000e+01]) > > > > 2-) What is wrong with the arccos calculation? Should not that > > > result the same as with cos(d) result? > > > > I first tested on 1.3.0, and later on my laptop using 1.4dev version > > which is about an old month built. > > > > Once again the results for each arc... function > > Er, I assume it's np.arccos ? > Sorry too much time spent in ipython -pylab :) I[18]: arccos? Type: ufunc Base Class: String Form: Namespace: Interactive File: /home/gsever/Desktop/python-repo/numpy/numpy/__init__.py > Anyway, I'm puzzled. Works like a charm here (r7438 for numpy.ma). > Could it be that something went wrng with some ufuncs ? This I don't know :( > I didn't touch > ma since 09/08 (thanks, svn history), so I don't think it comes from > here... Yes, SVN is a very useful invention indeed. I[6]: numpy.__version__ O[6]: '1.4.0.dev' For some reason it doesn't list check-out revision. Doing an ls -l reveals that those are checked-out and installed after August 13 which was a preparation for the SciPy 09 :) Would you mind trying a more recent svn version ? > This is the last resort. I will eventually try this if I don't any other options left. I confirmed the same arccos weirdness in Sage Notebook (www.sagenb.org) where Numpy 1.3.0 is installed there. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Tue Oct 6 22:06:28 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 6 Oct 2009 22:06:28 -0400 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> Message-ID: On Oct 6, 2009, at 6:57 PM, G?khan Sever wrote: > > Seeing a different filling value is causing confusion. Both for > myself, and when I try to demonstrate the usage of masked array to > other people. Also say, if I want to replace that one element back > to its original state will it use fill_value as 1e+20 or 999999.9999? I knew I was missing something: When you use display a mask entry, you actually display the `masked` constant: it's a 0-shaped float masked array with its own `fill_value`, but more importantly, it's a constant. You can use it to test whether one element is masked. Check this example: >>> x = ma.array([1,2,3],mask=[0,1,0],dtype=int,fill_value=999) >>> x masked_array(data = [1 -- 3], mask = [False True False], fill_value = 999) >>> x[1] is masked True >>> x[1] masked_array(data = --, mask = True, fill_value = 1e+20) Now, you can change the fill_value of the masked element to whatever you want, but it'll be propagated >>> ma.masked.fill_value = -999. >>> x[1] masked_array(data = --, mask = True, fill_value = -999.0) >>> y = ma.array([3,2,1],mask=[1,0,1]) >>> y[0] masked_array(data = --, mask = True, fill_value = -999.0) See ? Now, I understand this behavior is a bit confusing. Unfortunately, we need to keep being able to use (element is masked), which implies that we need to keep this apparent inconsistency. What we could do is to define some specific display for the `masked` constant like `masked`. I'm open to suggestions. From bsouthey at gmail.com Tue Oct 6 22:08:58 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 6 Oct 2009 21:08:58 -0500 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> <4ACBAC0E.3070708@noaa.gov> Message-ID: On Tue, Oct 6, 2009 at 4:04 PM, Pierre GM wrote: > > On Oct 6, 2009, at 4:43 PM, Christopher Barker wrote: > >> Pierre GM wrote: >>>> I think that the default invalid_raise should be True. >>> >>> Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ? >> >> yup -- make it +2 -- ignoring erreos and losing data by default is a >> "bad idea"! > > OK then, that's enough for me: I'll put invalid_raise as True by > default. Note that a warning was emitted no matter what. > > >> >>>> One 'feature' is that there is no way to indicate multiple >>>> delimiters >>>> when the delimiter is whitespace. >>>> A B C D >>>> 1 2 3 4 >>>> 1 ? ? 4 5 >> >> I'd say someone has made a very poor choice of file formats! No, just seeing what sort of problems I can create. This case is partly based on if someone is using tab-delimited then they need to set the delimiter='\t' otherwise it gives an error. Also I often parse text files so, yes, you have to be careful of the delimiters. It is also arises because certain programs like spreadsheets there is the option to merge delimiters - actually in SAS it is default (you need to specify the DSD option). >> >> Unless this s a fixed width file, in which case it should be processes >> as such, rather than as a delimited one. I suppose it wouldn't hurt to >> add that feature to genfromtxt.. or is it there already. Perhaps >> that's >> what this means: >> >>> Have you tried using a sequence of integers for the delimiter ? > > Yes, if you give a sequence of integers as delimiter, it is > interpreted as the length of each field. At least, should be. More to learn and test. Anyhow, I am really impressed on how this function works. Bruce From pgmdevlist at gmail.com Tue Oct 6 22:22:26 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 6 Oct 2009 22:22:26 -0400 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> Message-ID: <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> On Oct 6, 2009, at 9:54 PM, G?khan Sever wrote: > > > Also say, if I want to replace that one element back to its original > > state will it use fill_value as 1e+20 or 999999.9999? > > What do you mean by 'replace back to its original state' ? Using > `filled`, you mean ? > > Yes, in more properly stated fashion "filled" :) > I[14]: c.data['Air_Temp'][4] > O[14]: > masked_array(data = --, > mask = True, > fill_value = 1e+20) > > > I[15]: c.data['Air_Temp'][4].filled() > O[15]: array(1e+20) > > Little buggy, isn't it? It properly fill the whole array: > > I[13]: c.data['Air_Temp'].filled() > O[13]: > array([ 1.31509000e+01, 1.31309000e+01, 1.31278000e+01, > 1.31542000e+01, 1.00000000e+06, 1.31539000e+01, > 1.31387000e+01, 1.00000000e+06, 1.00000000e+06, > 1.00000000e+06, 1.31107000e+01, 1.31351000e+01, > 1.32073000e+01, 1.32562000e+01, 1.33533000e+01, > 1.33889000e+01, 1.34067000e+01, 1.32938000e+01, > 1.31962000e+01, 1.31248000e+01, 1.30411000e+01, > 1.29534000e+01, 1.28354000e+01, 1.27392000e+01, > 1.26725000e+01]) Once again, when you access your 5th element, you get the special `masked` constant. If you fill this constant, you'll get something which is probably not what you want. And I would need a *REALLY* compelling reason to change this behavior, as it's gonna break a lot of things (the masked constant has been around for a while) > > > 2-) What is wrong with the arccos calculation? Should not that > > Er, I assume it's np.arccos ? > > Sorry too much time spent in ipython -pylab :) Well, i use ipython -pylab regularly as well, but still have the reflex of using np. ;) > > I[18]: arccos? > Type: ufunc > Base Class: > String Form: > Namespace: Interactive > File: /home/gsever/Desktop/python-repo/numpy/numpy/ > __init__.py > > > Anyway, I'm puzzled. Works like a charm here (r7438 for numpy.ma). > Could it be that something went wrng with some ufuncs ? > > This I don't know :( > > I didn't touch > ma since 09/08 (thanks, svn history), so I don't think it comes from > here... > > Yes, SVN is a very useful invention indeed. > > I[6]: numpy.__version__ > O[6]: '1.4.0.dev' > > For some reason it doesn't list check-out revision. I know, and it's bugging me as well. if you have a build directory somewhere, check numpy/core/__svn_version__.py > This is the last resort. I will eventually try this if I don't any > other options left. I gonna have difficulties fixing something that I don't see broken... Now, there might be something wrong in my installation. I gonna try to install 1.3.0 somwehere. say, what Python are you using ? From pgmdevlist at gmail.com Tue Oct 6 22:27:12 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 6 Oct 2009 22:27:12 -0400 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> <4ACBAC0E.3070708@noaa.gov> Message-ID: On Oct 6, 2009, at 10:08 PM, Bruce Southey wrote: > No, just seeing what sort of problems I can create. This case is > partly based on if someone is using tab-delimited then they need to > set the delimiter='\t' otherwise it gives an error. Also I often parse > text files so, yes, you have to be careful of the delimiters. It is > also arises because certain programs like spreadsheets there is the > option to merge delimiters - actually in SAS it is default (you need > to specify the DSD option). Ahah! I get it. Well, I remmbr that we discussed something like that a few months ago when I started working on np.genfromtxt, and the default of *not* merging whitespaces was requested. I gonna check whether we can't put this option somewhere now... > Anyhow, I am really impressed on how this function works. Thx. I hope things haven't been slowed down too much. From jsseabold at gmail.com Tue Oct 6 22:40:50 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 6 Oct 2009 22:40:50 -0400 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> <4ACBAC0E.3070708@noaa.gov> Message-ID: On Tue, Oct 6, 2009 at 10:08 PM, Bruce Southey wrote: > On Tue, Oct 6, 2009 at 4:04 PM, Pierre GM wrote: >> >> On Oct 6, 2009, at 4:43 PM, Christopher Barker wrote: >> >>> Pierre GM wrote: >>>>> I think that the default invalid_raise should be True. >>>> >>>> Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ? >>> >>> yup -- make it +2 -- ignoring erreos and losing data by default is a >>> "bad idea"! >> >> OK then, that's enough for me: I'll put invalid_raise as True by >> default. Note that a warning was emitted no matter what. >> >> >>> >>>>> One 'feature' is that there is no way to indicate multiple >>>>> delimiters >>>>> when the delimiter is whitespace. >>>>> A B C D >>>>> 1 2 3 4 >>>>> 1 ? ? 4 5 >>> >>> I'd say someone has made a very poor choice of file formats! > > No, just seeing what sort of problems I can create. This case is > partly based on if someone is using tab-delimited then they need to > set the delimiter='\t' otherwise it gives an error. Also I often parse > text files so, yes, you have to be careful of the delimiters. It is > also arises because certain programs like spreadsheets there is the > option to merge delimiters - actually in SAS it is default (you need > to specify the DSD option). > >>> >>> Unless this s a fixed width file, in which case it should be processes >>> as such, rather than as a delimited one. I suppose it wouldn't hurt to >>> add that feature to genfromtxt.. or is it there already. Perhaps >>> that's >>> what this means: >>> >>>> Have you tried using a sequence of integers for the delimiter ? >> >> Yes, if you give a sequence of integers as delimiter, it is >> interpreted as the length of each field. At least, should be. > > More to learn and test. > There's an example on using the fixed-width delimiter here: http://docs.scipy.org/numpy/docs/numpy.lib.io.genfromtxt/ As far as I know, it works fine. > Anyhow, I am really impressed on how this function works. > Agreed. Genfromtxt and the derived are very useful. Skipper From gokhansever at gmail.com Tue Oct 6 22:58:26 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 6 Oct 2009 21:58:26 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> Message-ID: <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> On Tue, Oct 6, 2009 at 9:22 PM, Pierre GM wrote: > > On Oct 6, 2009, at 9:54 PM, G?khan Sever wrote: > > > > > Also say, if I want to replace that one element back to its original > > > state will it use fill_value as 1e+20 or 999999.9999? > > > > What do you mean by 'replace back to its original state' ? Using > > `filled`, you mean ? > > > > Yes, in more properly stated fashion "filled" :) > > > I[14]: c.data['Air_Temp'][4] > > O[14]: > > masked_array(data = --, > > mask = True, > > fill_value = 1e+20) > > > > > > I[15]: c.data['Air_Temp'][4].filled() > > O[15]: array(1e+20) > > > > Little buggy, isn't it? It properly fill the whole array: > > > > I[13]: c.data['Air_Temp'].filled() > > O[13]: > > array([ 1.31509000e+01, 1.31309000e+01, 1.31278000e+01, > > 1.31542000e+01, 1.00000000e+06, 1.31539000e+01, > > 1.31387000e+01, 1.00000000e+06, 1.00000000e+06, > > 1.00000000e+06, 1.31107000e+01, 1.31351000e+01, > > 1.32073000e+01, 1.32562000e+01, 1.33533000e+01, > > 1.33889000e+01, 1.34067000e+01, 1.32938000e+01, > > 1.31962000e+01, 1.31248000e+01, 1.30411000e+01, > > 1.29534000e+01, 1.28354000e+01, 1.27392000e+01, > > 1.26725000e+01]) > > Once again, when you access your 5th element, you get the special > `masked` constant. If you fill this constant, you'll get something > which is probably not what you want. And I would need a *REALLY* > compelling reason to change this behavior, as it's gonna break a lot > of things (the masked constant has been around for a while) > > I see your points. I don't want to give you extra work, don't worry :) It just seem a bit bizarre: I[27]: c.data['Air_Temp'].fill_value O[27]: 999999.99990000005 I[28]: c.data['Air_Temp'][4].fill_value O[28]: 1e+20 As you see, it just returns two different fill_values. I know eventually you will be the one handling this :) it might be good to add this issue to the tracker. > > > > 2-) What is wrong with the arccos calculation? Should not that > > > > Er, I assume it's np.arccos ? > > > > Sorry too much time spent in ipython -pylab :) > > Well, i use ipython -pylab regularly as well, but still have the > reflex of using np. ;) > > > Good reflex. Saves you from making extra explanations. But it works with just typing array why should I type np.array (Ohh my namespacess :) It is just an IPython magic. > > > > > I[18]: arccos? > > Type: ufunc > > Base Class: > > String Form: > > Namespace: Interactive > > File: /home/gsever/Desktop/python-repo/numpy/numpy/ > > __init__.py > > > > > > Anyway, I'm puzzled. Works like a charm here (r7438 for numpy.ma). > > Could it be that something went wrng with some ufuncs ? > > > > This I don't know :( > > > > I didn't touch > > ma since 09/08 (thanks, svn history), so I don't think it comes from > > here... > > > > Yes, SVN is a very useful invention indeed. > > > > I[6]: numpy.__version__ > > O[6]: '1.4.0.dev' > > > > For some reason it doesn't list check-out revision. > > I know, and it's bugging me as well. if you have a build directory > somewhere, check numpy/core/__svn_version__.py > > There is build directory but no files that contains svn :( > > This is the last resort. I will eventually try this if I don't any > > other options left. > > I gonna have difficulties fixing something that I don't see broken... > Now, there might be something wrong in my installation. I gonna try to > install 1.3.0 somwehere. say, what Python are you using ? > OK, I use meld to diff my copy of ma/core.py with the latest trunk version. There are lots of differences :) So there is a possibility that I might have built my local numpy before 09/08. I should renew my copy. Do you know the link of svn browser for the numpy? I don't know how you are making separate installations without overriding other package? I either use Sage (if I have extra time) or SPD. They are both shipped with numpy 1.3.0. Let see how it will result with a new build... > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Tue Oct 6 23:01:45 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 6 Oct 2009 23:01:45 -0400 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> <4ACBAC0E.3070708@noaa.gov> Message-ID: On Tue, Oct 6, 2009 at 10:27 PM, Pierre GM wrote: >> Anyhow, I am really impressed on how this function works. > > Thx. I hope things haven't been slowed down too much. In keeping with the making some work for you theme, I filed an enhancement ticket for one change that we discussed and another IMO useful addition. http://projects.scipy.org/numpy/ticket/1238 I think it would be nice if we could do data = np.genfromtxt(SomeFile, dtype=float, names = ['var1', 'var2', 'var3' ...]) So that float is paired with each variable name. Also, the one that came up earlier of data = np.genfromtxt(SomeFile, dtype=(int, int, float), names = ['var1','var2','var3'] I'm not completely convinced on this one though, since dtype = "i8,i8,f8" works. I don't want know how much confusion it would add to have the dtype argument accept a non-valid dtype construction. Skipper PS. Is it bad form for me to go ahead and assign these kinds of tickets to you if you're going to be working on them, or do you get pinged when any ticket is filed? From pgmdevlist at gmail.com Tue Oct 6 23:15:40 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 6 Oct 2009 23:15:40 -0400 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> Message-ID: <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> On Oct 6, 2009, at 10:58 PM, G?khan Sever wrote: > > I see your points. I don't want to give you extra work, don't > worry :) It just seem a bit bizarre: > > I[27]: c.data['Air_Temp'].fill_value > O[27]: 999999.99990000005 > > I[28]: c.data['Air_Temp'][4].fill_value > O[28]: 1e+20 > > As you see, it just returns two different fill_values. I know, but I hope you see the difference : in the first line, you access the `fill_value` of the array. In the second, you access the `fill_value` of the `masked` constant. Each time you access a masked element of an array with __getitem__, you get the masked constant. We could force the constant to inherit the fill_value of the array that calls __getitem__, but it'd be propagated. > I know eventually you will be the one handling this :) it might be > good to add this issue to the tracker. Go for it, but don't expect anything before the release of 1.4.0 (in the next few months) > > > This is the last resort. I will eventually try this if I don't any > > other options left. > > I gonna have difficulties fixing something that I don't see broken... > Now, there might be something wrong in my installation. I gonna try to > install 1.3.0 somwehere. say, what Python are you using ? > > OK, I use meld to diff my copy of ma/core.py with the latest trunk > version. There are lots of differences :) So there is a possibility > that I might have built my local numpy before 09/08. I should renew > my copy. Do you know the link of svn browser for the numpy? I don't > know how you are making separate installations without overriding > other package? I either use Sage (if I have extra time) or SPD. They > are both shipped with numpy 1.3.0. Make yourself a favor and install virtualenv and virtualenvwrapper. That way, several versions of the same package can coexist without interference. Oh, and install pip till you're at it: http://pypi.python.org/pypi/virtualenv http://www.doughellmann.com/projects/virtualenvwrapper/ http://pypi.python.org/pypi/pip From pgmdevlist at gmail.com Tue Oct 6 23:23:57 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 6 Oct 2009 23:23:57 -0400 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> <4ACBAC0E.3070708@noaa.gov> Message-ID: On Oct 6, 2009, at 11:01 PM, Skipper Seabold wrote: > > In keeping with the making some work for you theme, I filed an > enhancement ticket for one change that we discussed and another IMO > useful addition. http://projects.scipy.org/numpy/ticket/1238 > > I think it would be nice if we could do > > data = np.genfromtxt(SomeFile, dtype=float, names = ['var1', 'var2', > 'var3' ...]) > > So that float is paired with each variable name. Also, the one that > came up earlier of > > data = np.genfromtxt(SomeFile, dtype=(int, int, float), names = > ['var1','var2','var3'] > > I'm not completely convinced on this one though, since dtype = > "i8,i8,f8" works. I don't want know how much confusion it would add > to have the dtype argument accept a non-valid dtype construction. Actually, it's rather straightforward. I already have something that supports dtype=(int,int,float) (far easier to handle than "i4,i4,f8"), I need to tweak a couple of things when the names don't match before posting. Pairing the names with the dtype is pretty neat, that would be quite easy to implement > PS. Is it bad form for me to go ahead and assign these kinds of > tickets to you if you're going to be working on them, or do you get > pinged when any ticket is filed? Go for it. I'm only notified when a ticket is assigned to me directly. From gokhansever at gmail.com Tue Oct 6 23:47:11 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 6 Oct 2009 22:47:11 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> Message-ID: <49d6b3500910062047u445af52ata01ec68c2e14744f@mail.gmail.com> On Tue, Oct 6, 2009 at 10:15 PM, Pierre GM wrote: > > On Oct 6, 2009, at 10:58 PM, G?khan Sever wrote: > > > > I see your points. I don't want to give you extra work, don't > > worry :) It just seem a bit bizarre: > > > > I[27]: c.data['Air_Temp'].fill_value > > O[27]: 999999.99990000005 > > > > I[28]: c.data['Air_Temp'][4].fill_value > > O[28]: 1e+20 > > > > As you see, it just returns two different fill_values. > > I know, but I hope you see the difference : in the first line, you > access the `fill_value` of the array. In the second, you access the > `fill_value` of the `masked` constant. Each time you access a masked > element of an array with __getitem__, you get the masked constant. We > could force the constant to inherit the fill_value of the array that > calls __getitem__, but it'd be propagated. > > Got these points. Thanks It took a while I had to re-built matplotlib to use ipython -pylab :) I built the numpy again source from the trunk and arccos (as well as other arc functions) problem has disappeared. It all started with trying to calculate great circle navigation equations using masked arrays, and seeing this range_calc function returning some weird results where it was not supposed to do. Further tracing down the error to arccos. def range_calc(lat_r, lat_t, long_r, long_t): range = degrees(arccos(sin(radians(lat_r)) * sin(radians(lat_t)) + cos(radians(lat_r)) * cos(radians(lat_t)) * cos(radians(long_t - long_r)))) * F azimuth = degrees(arccos((sin(radians(lat_t)) - cos(radians(range / F)) * sin(radians(lat_r))) / (sin(radians(range / F)) * cos(radians(lat_r))))) if long_t - long_r < 0: azimuth = 360 - azimuth return range, azimuth Happy now ;) > > I know eventually you will be the one handling this :) it might be > > good to add this issue to the tracker. > > Go for it, but don't expect anything before the release of 1.4.0 (in > the next few months) > > I will do this shortly. > > > > > This is the last resort. I will eventually try this if I don't any > > > other options left. > > > > I gonna have difficulties fixing something that I don't see broken... > > Now, there might be something wrong in my installation. I gonna try to > > install 1.3.0 somwehere. say, what Python are you using ? > > > > OK, I use meld to diff my copy of ma/core.py with the latest trunk > > version. There are lots of differences :) So there is a possibility > > that I might have built my local numpy before 09/08. I should renew > > my copy. Do you know the link of svn browser for the numpy? I don't > > know how you are making separate installations without overriding > > other package? I either use Sage (if I have extra time) or SPD. They > > are both shipped with numpy 1.3.0. > > Make yourself a favor and install virtualenv and virtualenvwrapper. > That way, several versions of the same package can coexist without > interference. Oh, and install pip till you're at it: > > http://pypi.python.org/pypi/virtualenv > http://www.doughellmann.com/projects/virtualenvwrapper/ > http://pypi.python.org/pypi/pip > > > "pip" this is the first time I am hearing. Will give these tools a try probably this weekend. Thanks again for your clarifications. Now, I have to update my advisor's numpy to make his code running correctly. In the first place his code was running properly by using manually created masks for numpy arrays. Using the masked arrays we broke it. Now we know what causing the error. It feels good :) > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Wed Oct 7 00:10:55 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 6 Oct 2009 23:10:55 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> Message-ID: <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com> Created the ticket http://projects.scipy.org/numpy/ticket/1253 Could you tell me briefly what was the source of leak in arccos case? And how do you write a test code for these cases? On Tue, Oct 6, 2009 at 10:15 PM, Pierre GM wrote: > > On Oct 6, 2009, at 10:58 PM, G?khan Sever wrote: > > > > I see your points. I don't want to give you extra work, don't > > worry :) It just seem a bit bizarre: > > > > I[27]: c.data['Air_Temp'].fill_value > > O[27]: 999999.99990000005 > > > > I[28]: c.data['Air_Temp'][4].fill_value > > O[28]: 1e+20 > > > > As you see, it just returns two different fill_values. > > I know, but I hope you see the difference : in the first line, you > access the `fill_value` of the array. In the second, you access the > `fill_value` of the `masked` constant. Each time you access a masked > element of an array with __getitem__, you get the masked constant. We > could force the constant to inherit the fill_value of the array that > calls __getitem__, but it'd be propagated. > > > I know eventually you will be the one handling this :) it might be > > good to add this issue to the tracker. > > Go for it, but don't expect anything before the release of 1.4.0 (in > the next few months) > > > > > > This is the last resort. I will eventually try this if I don't any > > > other options left. > > > > I gonna have difficulties fixing something that I don't see broken... > > Now, there might be something wrong in my installation. I gonna try to > > install 1.3.0 somwehere. say, what Python are you using ? > > > > OK, I use meld to diff my copy of ma/core.py with the latest trunk > > version. There are lots of differences :) So there is a possibility > > that I might have built my local numpy before 09/08. I should renew > > my copy. Do you know the link of svn browser for the numpy? I don't > > know how you are making separate installations without overriding > > other package? I either use Sage (if I have extra time) or SPD. They > > are both shipped with numpy 1.3.0. > > Make yourself a favor and install virtualenv and virtualenvwrapper. > That way, several versions of the same package can coexist without > interference. Oh, and install pip till you're at it: > > http://pypi.python.org/pypi/virtualenv > http://www.doughellmann.com/projects/virtualenvwrapper/ > http://pypi.python.org/pypi/pip > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed Oct 7 00:33:00 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 7 Oct 2009 00:33:00 -0400 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com> Message-ID: <9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com> On Oct 7, 2009, at 12:10 AM, G?khan Sever wrote: > Created the ticket http://projects.scipy.org/numpy/ticket/1253 Want even more confusion ? >>> x = ma.array([1,2,3],mask=[0,1,0], dtype=int) >>> x[0].dtype dtype('int64') >>> x[1].dtype dtype('float64') >>> x[2].dtype dtype('int64') Yet another illustration of the masked constant... The more I think about it, the more I think we should have a specific object ("MaskedConstant") that would do nothing but tell us that it is masked. > Could you tell me briefly what was the source of leak in arccos case? No idea, as I still haven't figured why you were having the problem in the first place > And how do you write a test code for these cases? assert(np.arccos(ma.masked), ma.masked) would be the simplest. From gokhansever at gmail.com Wed Oct 7 01:12:19 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 7 Oct 2009 00:12:19 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com> <9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com> Message-ID: <49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com> On Tue, Oct 6, 2009 at 11:33 PM, Pierre GM wrote: > > On Oct 7, 2009, at 12:10 AM, G?khan Sever wrote: > > > Created the ticket http://projects.scipy.org/numpy/ticket/1253 > > Want even more confusion ? > >>> x = ma.array([1,2,3],mask=[0,1,0], dtype=int) > >>> x[0].dtype > dtype('int64') > >>> x[1].dtype > dtype('float64') > >>> x[2].dtype > dtype('int64') > > Yet another illustration of the masked constant... The more I think > about it, the more I think we should have a specific object > ("MaskedConstant") that would do nothing but tell us that it is masked. > Confusing indeed. One more from me: I[1]: a = np.arange(5) I[2]: mask = 999 I[6]: a[3] = 999 I[7]: am = ma.masked_equal(a, mask) I[8]: am O[8]: masked_array(data = [0 1 2 -- 4], mask = [False False False True False], fill_value = 999999) Where does this fill_value come from? To me it is little confusing having a "value" and "fill_value" in masked array method arguments. > > > > Could you tell me briefly what was the source of leak in arccos case? > > No idea, as I still haven't figured why you were having the problem in > the first place > Probably you can pin-point the error by testing a 1.3.0 version numpy. Not too many arc function with masked array users around I guess :) > > > And how do you write a test code for these cases? > > assert(np.arccos(ma.masked), ma.masked) would be the simplest. > Good to know this. The more I spend time with numpy the more I understand the importance of testing the code automatically. This said, I still find the test-driven-development approach somewhat bizarre. Start only by writing test code and keep implementing your code until all the tests are satisfied. Very interesting...These software engineers... > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed Oct 7 01:47:53 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 7 Oct 2009 01:47:53 -0400 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com> <9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com> <49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com> Message-ID: <61E03928-9A65-449B-8785-E22E52F1C034@gmail.com> On Oct 7, 2009, at 1:12 AM, G?khan Sever wrote: > One more from me: > I[1]: a = np.arange(5) > I[2]: mask = 999 > I[6]: a[3] = 999 > I[7]: am = ma.masked_equal(a, mask) > > I[8]: am > O[8]: > masked_array(data = [0 1 2 -- 4], > mask = [False False False True False], > fill_value = 999999) > > Where does this fill_value come from? To me it is little confusing > having a "value" and "fill_value" in masked array method arguments. Because the two are unrelated. The `fill_value` is the value used to fill the masked elements (that is, the missing entries). When you create a masked array, you get a `fill_value`, whose actual value is defined by default from the dtype of the array: for int, it's 999999, for float, 1e+20, you get the idea. The value you used for masking is different, it's just whatver value you consider invalid. Now, if I follow you, you would expect the value in `masked_equal(array, value)` to be the `fill_value` of the output. That's an idea, would you mind fiilling a ticket/enhancement and assign it to me? So that I don't forget. > Probably you can pin-point the error by testing a 1.3.0 version > numpy. Not too many arc function with masked array users around I > guess :) Will try, but "if it ain't broken, don't fix it"... > assert(np.arccos(ma.masked), ma.masked) would be the simplest. (and in fact, it'd be assert(np.arccos(ma.masked) is ma.masked) in this case). > Good to know this. The more I spend time with numpy the more I > understand the importance of testing the code automatically. This > said, I still find the test-driven-development approach somewhat > bizarre. Start only by writing test code and keep implementing your > code until all the tests are satisfied. Very interesting...These > software engineers... Bah, it's not a rule cast in iron... You can start writing your code but do write the tests at the same time. It's the best way to make sure you're not breaking something later on. > From gokhansever at gmail.com Wed Oct 7 02:57:01 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 7 Oct 2009 01:57:01 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <61E03928-9A65-449B-8785-E22E52F1C034@gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com> <9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com> <49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com> <61E03928-9A65-449B-8785-E22E52F1C034@gmail.com> Message-ID: <49d6b3500910062357u7c831b3dvb6b4c6c2fd3cb3de@mail.gmail.com> On Wed, Oct 7, 2009 at 12:47 AM, Pierre GM wrote: > > On Oct 7, 2009, at 1:12 AM, G?khan Sever wrote: > > One more from me: > > I[1]: a = np.arange(5) > > I[2]: mask = 999 > > I[6]: a[3] = 999 > > I[7]: am = ma.masked_equal(a, mask) > > > > I[8]: am > > O[8]: > > masked_array(data = [0 1 2 -- 4], > > mask = [False False False True False], > > fill_value = 999999) > > > > Where does this fill_value come from? To me it is little confusing > > having a "value" and "fill_value" in masked array method arguments. > > Because the two are unrelated. The `fill_value` is the value used to > fill the masked elements (that is, the missing entries). > When you create a masked array, you get a `fill_value`, whose actual > value is defined by default from the dtype of the array: for int, it's > 999999, for float, 1e+20, you get the idea. > The value you used for masking is different, it's just whatver value > you consider invalid. Now, if I follow you, you would expect the value > in `masked_equal(array, value)` to be the `fill_value` of the output. > That's an idea, would you mind fiilling a ticket/enhancement and > assign it to me? So that I don't forget. > One more example. (I still think the behaviour of fill_value is inconsistent) See below: I[6]: f = np.arange(5, dtype=float) I[7]: mask = 9999.9999 I[8]: f[3] = mask I[9]: fm = ma.masked_equal(f, mask) I[10]: fm O[10]: masked_array(data = [0.0 1.0 2.0 -- 4.0], mask = [False False False True False], fill_value = 1e+20) I[22]: fm2 = ma.masked_values(f, mask) I[23]: fm2 O[23]: masked_array(data = [0.0 1.0 2.0 -- 4.0], mask = [False False False True False], fill_value = 9999.9999) ma.masked_equal(x, value, copy=True) ma.masked_values(x, value, rtol=1.0000000000000001e-05, atol=1e-08, copy=True, shrink=True) Similar function definitions, but different fill_values... Ok, it is almost 2 AM here my understanding might be crawling on the ground. Probably I will re-read your comments and file an issue on the trac. > > > > Probably you can pin-point the error by testing a 1.3.0 version > > numpy. Not too many arc function with masked array users around I > > guess :) > > Will try, but "if it ain't broken, don't fix it"... > Also if it is working don't update (This applies to Fedora updates :) especially if you have an Nvidia display card) > > > assert(np.arccos(ma.masked), ma.masked) would be the simplest. > > (and in fact, it'd be assert(np.arccos(ma.masked) is ma.masked) in > this case). > > > > Good to know this. The more I spend time with numpy the more I > > understand the importance of testing the code automatically. This > > said, I still find the test-driven-development approach somewhat > > bizarre. Start only by writing test code and keep implementing your > > code until all the tests are satisfied. Very interesting...These > > software engineers... > > Bah, it's not a rule cast in iron... You can start writing your code > but do write the tests at the same time. It's the best way to make > sure you're not breaking something later on. > > > > That's what I have been thinking, a more reasonable way. The other is way too a reverse thinking. Thanks for the long hours discussion. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From johan.gronqvist at gmail.com Wed Oct 7 03:30:22 2009 From: johan.gronqvist at gmail.com (=?ISO-8859-1?Q?Johan_Gr=F6nqvist?=) Date: Wed, 07 Oct 2009 09:30:22 +0200 Subject: [Numpy-discussion] numpy.linalg.eig memory issue with libatlas? Message-ID: [I am resending this as the previous attempt seems to have failed] Hello List, I am looking at memory errors when using numpy.linalg.eig(). Short version: I had memory errors in numpy.linalg.eig(), and I have reasons (valgrind) to believe these are due to writing to incorrect memory addresses in the diagonalization routine zgeev, called by numpy.linalg.eig(). I realized that I had recently installed atlas, and now had several lapack-like libraries, so I uninstalled atlas, and the issues seemed to go away. My question is: Could it be that some lapack/blas/atlas package I use is incompatible with the numpy I use, and if so, is there a method to diagnose this in a more reliable way? Longer version: The system used is an updated debian testing (squeeze), on amd64. My program uses numpy, matplotlib, and a module compiled using cython. I started getting errors from my program this week. Pdb and print-statements tell me that the errors arise around the point where I call numpy.linalg.eig(), but not every time. The type of error varies. Most frequently a segmentation fault, but sometimes a matrix dimension mismatch, and sometimes a message related to the python GC. Valgrind tells me that something "impossible" happened, and that this is probably due to invalid writes earlier during the program execution. There seems to be two invalid writes after each program crash, and the log looks like this (it only contains two invalid writes): [...] ==6508== Invalid write of size 8 ==6508== at 0x92D2597: zunmhr_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x920A42B: zlaqr3_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x9205D11: zlaqr0_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x91B0C4D: zhseqr_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x911CA15: zgeev_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x881B81B: lapack_lite_zgeev (lapack_litemodule.c:590) ==6508== by 0x4911D4: PyEval_EvalFrameEx (ceval.c:3612) ==6508== by 0x491CE1: PyEval_EvalFrameEx (ceval.c:3698) ==6508== by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875) ==6508== by 0x490F17: PyEval_EvalFrameEx (ceval.c:3708) ==6508== by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875) ==6508== by 0x4DC991: function_call (funcobject.c:517) ==6508== Address 0x67ab118 is not stack'd, malloc'd or (recently) free'd ==6508== ==6508== Invalid write of size 8 ==6508== at 0x92D25A8: zunmhr_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x920A42B: zlaqr3_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x9205D11: zlaqr0_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x91B0C4D: zhseqr_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x911CA15: zgeev_ (in /usr/lib/atlas/liblapack.so.3gf.0) ==6508== by 0x881B81B: lapack_lite_zgeev (lapack_litemodule.c:590) ==6508== by 0x4911D4: PyEval_EvalFrameEx (ceval.c:3612) ==6508== by 0x491CE1: PyEval_EvalFrameEx (ceval.c:3698) ==6508== by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875) ==6508== by 0x490F17: PyEval_EvalFrameEx (ceval.c:3708) ==6508== by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875) ==6508== by 0x4DC991: function_call (funcobject.c:517) ==6508== Address 0x67ab110 is not stack'd, malloc'd or (recently) free'd [...] valgrind: m_mallocfree.c:248 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed. valgrind: Heap block lo/hi size mismatch: lo = 96, hi = 0. This is probably caused by your program erroneously writing past the end of a heap block and corrupting heap metadata. If you fix any invalid writes reported by Memcheck, this assertion failure will probably go away. Please try that before reporting this as a bug. [...] Today I looked in my package installation logs to see what had changed recently, and I noticed that I installed atlas (debian package libatlas3gf-common) recently. I uninstalled that package, and now the same program seems to have no memory errors. The packages I removed from the system today were libarpack2 libfltk1.1 libftgl2 libgraphicsmagick++3 libgraphicsmagick3 libibverbs1 libopenmpi1.3 libqrupdate1 octave3.2-common octave3.2-emacsen libatlas3gf-base octave3.2 My interpretation is that I had several packages available containing the diagonalization functionality, but that they differed subtly in their interfaces. My recent installation of atlas made numpy use (the incompatible) atlas instead of its previous choice, and removal of atlas restored the situation to the state of last week. Now for the questions: Is this a reasonable hypothesis? Is it known? Can it be investigated more precisely by comparing versions somehow? Regards / johan From pgmdevlist at gmail.com Wed Oct 7 04:05:07 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 7 Oct 2009 04:05:07 -0400 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910062357u7c831b3dvb6b4c6c2fd3cb3de@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com> <9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com> <49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com> <61E03928-9A65-449B-8785-E22E52F1C034@gmail.com> <49d6b3500910062357u7c831b3dvb6b4c6c2fd3cb3de@mail.gmail.com> Message-ID: <79D48A64-EAB1-429B-BB73-B0759D1BA060@gmail.com> On Oct 7, 2009, at 2:57 AM, G?khan Sever wrote: > One more example. (I still think the behaviour of fill_value is > inconsistent) Well, ma.masked_values use `value` to define fill_value, ma.masked_equal does not. So yes, there's an inconsistency here. Once again, please fill an enhancement request ticket. I should be able to deal with this one quite soon. From cournape at gmail.com Wed Oct 7 04:06:09 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 7 Oct 2009 17:06:09 +0900 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> Message-ID: <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris wrote: > > > Looks like a clue ;) Ok, I fixed it here: http://github.com/cournape/numpy/tree/fix_abi But that's an ugly hack. I think we should consider rewriting how we generate the API: instead of automatically growing the API array of fptr, we should explicitly mark which function name has which index, and hardcode it. It would help quite a bit to avoid changing the ABI unvoluntary. cheers, David From pgmdevlist at gmail.com Wed Oct 7 07:38:54 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 7 Oct 2009 07:38:54 -0400 Subject: [Numpy-discussion] SVN + Python 2.5.4 (32b) + MacOS 10.6.1 Message-ID: All, I need to test the numpy SVN on a 10.6.1 mac, but using Python 2.5.4 (32b) instead of the 2.6.1 (64b). The sources get compiled OK (apparently, find the build here: http://pastebin.com/m147a2909 ) but numpy fails to import: File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ __init__.py", line 130, in import add_newdocs File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ add_newdocs.py", line 9, in from lib import add_newdoc File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ lib/__init__.py", line 4, in from type_check import * File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ lib/type_check.py", line 8, in import numpy.core.numeric as _nx File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ core/__init__.py", line 8, in import numerictypes as nt File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ core/numerictypes.py", line 737, in _typestr[key] = empty((1,),key).dtype.str[1:] ValueError: array is too big. Obviously, I'm messing between 32b and 64b, but can't figure where. Any help/hint will be deeply appreciated Cheers P. FYI: Python 2.5.4 (r254:67916, Jul 7 2009, 23:51:24) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin] CFLAGS="-arch i386 -arch x86_64" FFLAGS="-arch i386 -arch x86_64" From charlesr.harris at gmail.com Wed Oct 7 08:31:25 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Oct 2009 06:31:25 -0600 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> Message-ID: On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau wrote: > On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris > wrote: > > > > > > Looks like a clue ;) > > Ok, I fixed it here: > > http://github.com/cournape/numpy/tree/fix_abi > > But that's an ugly hack. I think we should consider rewriting how we > generate the API: instead of automatically growing the API array of > fptr, we should explicitly mark which function name has which index, > and hardcode it. It would help quite a bit to avoid changing the ABI > unvoluntary. > > I'm thinking the safest thing to do is to move the new type to the end of the list. I'm not sure what all the ramifications are for compatibility to having it stuck in the middle like that, does it change the type numbers for all the types after? I wonder what the type numbers are internally? No doubt putting it at the end makes the logic for casting more difficult, but that is something that needs fixing anyway. Question - if the new type is simply removed from the list does anything break? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Wed Oct 7 08:37:16 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 7 Oct 2009 21:37:16 +0900 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> Message-ID: <5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com> On Wed, Oct 7, 2009 at 9:31 PM, Charles R Harris wrote: > > > On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau wrote: >> >> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris >> wrote: >> > >> > >> > Looks like a clue ;) >> >> Ok, I fixed it here: >> >> http://github.com/cournape/numpy/tree/fix_abi >> >> But that's an ugly hack. I think we should consider rewriting how we >> generate the API: instead of automatically growing the API array of >> fptr, we should explicitly mark which function name has which index, >> and hardcode it. It would help quite a bit to avoid changing the ABI >> unvoluntary. >> > > I'm thinking the safest thing to do is to move the new type to the end of > the list. That's what the above branch does. > I'm not sure what all the ramifications are for compatibility to > having it stuck in the middle like that, does it change the type numbers for > all the types after? Yes, there is no space left between the types declarations and the first functions. Currently, I just put things at the end manually, but that's really error prone. I am a bit lazy to fix this for real (I was thinking about using a python dict with hardcoded indexes as an entry instead of the current .txt files, but this requires several changes in the code generator, which is already not the greatest code to begin with). David From charlesr.harris at gmail.com Wed Oct 7 08:59:01 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Oct 2009 06:59:01 -0600 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com> Message-ID: On Wed, Oct 7, 2009 at 6:37 AM, David Cournapeau wrote: > On Wed, Oct 7, 2009 at 9:31 PM, Charles R Harris > wrote: > > > > > > On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau > wrote: > >> > >> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris > >> wrote: > >> > > >> > > >> > Looks like a clue ;) > >> > >> Ok, I fixed it here: > >> > >> http://github.com/cournape/numpy/tree/fix_abi > >> > >> But that's an ugly hack. I think we should consider rewriting how we > >> generate the API: instead of automatically growing the API array of > >> fptr, we should explicitly mark which function name has which index, > >> and hardcode it. It would help quite a bit to avoid changing the ABI > >> unvoluntary. > >> > > > > I'm thinking the safest thing to do is to move the new type to the end of > > the list. > > That's what the above branch does. > > > I'm not sure what all the ramifications are for compatibility to > > having it stuck in the middle like that, does it change the type numbers > for > > all the types after? > > Yes, there is no space left between the types declarations and the > first functions. Currently, I just put things at the end manually, but > that's really error prone. > > I am a bit lazy to fix this for real (I was thinking about using a > python dict with hardcoded indexes as an entry instead of the current > .txt files, but this requires several changes in the code generator, > which is already not the greatest code to begin with). > > What I'm concerned about is that, IIRC, types in the c-code can be referenced by their index in a list of types and that internal mechanism might be exposed to the outside somewhere. That is, what has happened to the order of the enumerated types? If that has changed, and if external code references a type by a hard-wired number, then there is a problem that goes beyond the code generator. The safe(r) thing to do in that case is add the new type to the end of the enumerated types and fix the promotion code so it doesn't try to rely on a linear order. I expect Robert can give the fastest answer to that question. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Oct 7 09:07:18 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Oct 2009 07:07:18 -0600 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com> Message-ID: On Wed, Oct 7, 2009 at 6:59 AM, Charles R Harris wrote: > > > On Wed, Oct 7, 2009 at 6:37 AM, David Cournapeau wrote: > >> On Wed, Oct 7, 2009 at 9:31 PM, Charles R Harris >> wrote: >> > >> > >> > On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau >> wrote: >> >> >> >> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris >> >> wrote: >> >> > >> >> > >> >> > Looks like a clue ;) >> >> >> >> Ok, I fixed it here: >> >> >> >> http://github.com/cournape/numpy/tree/fix_abi >> >> >> >> But that's an ugly hack. I think we should consider rewriting how we >> >> generate the API: instead of automatically growing the API array of >> >> fptr, we should explicitly mark which function name has which index, >> >> and hardcode it. It would help quite a bit to avoid changing the ABI >> >> unvoluntary. >> >> >> > >> > I'm thinking the safest thing to do is to move the new type to the end >> of >> > the list. >> >> That's what the above branch does. >> >> > I'm not sure what all the ramifications are for compatibility to >> > having it stuck in the middle like that, does it change the type numbers >> for >> > all the types after? >> >> Yes, there is no space left between the types declarations and the >> first functions. Currently, I just put things at the end manually, but >> that's really error prone. >> >> I am a bit lazy to fix this for real (I was thinking about using a >> python dict with hardcoded indexes as an entry instead of the current >> .txt files, but this requires several changes in the code generator, >> which is already not the greatest code to begin with). >> >> > What I'm concerned about is that, IIRC, types in the c-code can be > referenced by their index in a list of types and that internal mechanism > might be exposed to the outside somewhere. That is, what has happened to the > order of the enumerated types? If that has changed, and if external code > references a type by a hard-wired number, then there is a problem that goes > beyond the code generator. The safe(r) thing to do in that case is add the > new type to the end of the enumerated types and fix the promotion code so it > doesn't try to rely on a linear order. > > Here, for instance: "The various character codes indicating certain types are also part of an enumerated list. References to type characters (should they be needed at all) should always use these enumerations. The form of them is NPY LTR where " So those macros will generate a hard-coded number at compile time, and number that might have changed with the addition of the new types. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Wed Oct 7 10:55:33 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 7 Oct 2009 09:55:33 -0500 Subject: [Numpy-discussion] Building a new copy of NumPy Message-ID: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com> Hello, I checked-out the latest trunk and make a new installation of NumPy. My question: Is it a known behaviour that this action will result with re-building other packages that are dependent on NumPy. In my case, I had to re-built matplotlib, and now scipy. Here is the error message that I am getting while I try to import a scipy module: I[1]: run lab4.py --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) RuntimeError: FATAL: module compiled aslittle endian, but detected different endianness at runtime --------------------------------------------------------------------------- ImportError Traceback (most recent call last) /home/gsever/AtSc450/labs/04_thermals/lab4.py in () 2 3 import numpy as np ----> 4 from scipy import stats 5 6 /home/gsever/Desktop/python-repo/scipy/scipy/stats/__init__.py in () 5 from info import __doc__ 6 ----> 7 from stats import * 8 from distributions import * 9 from rv import * /home/gsever/Desktop/python-repo/scipy/scipy/stats/stats.py in () 196 # Scipy imports. 197 from numpy import array, asarray, dot, ma, zeros, sum --> 198 import scipy.special as special 199 import scipy.linalg as linalg 200 import numpy as np /home/gsever/Desktop/python-repo/scipy/scipy/special/__init__.py in () 6 #from special_version import special_version as __version__ 7 ----> 8 from basic import * 9 import specfun 10 import orthogonal /home/gsever/Desktop/python-repo/scipy/scipy/special/basic.py in () 6 7 from numpy import * ----> 8 from _cephes import * 9 import types 10 import specfun ImportError: numpy.core.multiarray failed to import WARNING: Failure executing file: -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Oct 7 11:10:28 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 7 Oct 2009 10:10:28 -0500 Subject: [Numpy-discussion] Building a new copy of NumPy In-Reply-To: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com> References: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com> Message-ID: <3d375d730910070810x460f9f69j6711cca72db8cb09@mail.gmail.com> On Wed, Oct 7, 2009 at 09:55, G?khan Sever wrote: > Hello, > > I checked-out the latest trunk and make a new installation of NumPy. My > question: Is it a known behaviour that this action will result with > re-building other packages that are dependent on NumPy. In my case, I had to > re-built matplotlib, and now scipy. Known issue. See the thread "Numpy SVN broken". -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From mdroe at stsci.edu Wed Oct 7 11:28:59 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Wed, 07 Oct 2009 11:28:59 -0400 Subject: [Numpy-discussion] byteswapping a complex scalar Message-ID: <4ACCB3BB.8010805@stsci.edu> I'm noticing an inconsistency as to how complex numbers are byteswapped as arrays vs. scalars, and wondering if I'm doing something wrong. >>> x = np.array([-1j], '>> x.tostring().encode('hex') '00000000000080bf' # This is a little-endian representation, in the order (real, imag) # When I swap the whole array, it swaps each of the (real, imag) parts separately >>> y = x.byteswap() >>> y.tostring().encode('hex') '00000000bf800000' # and this round-trips fine >>> z = np.fromstring(y.tostring(), dtype='>c8') >>> assert z[0] == -1j >>> # When I swap the scalar, it seems to swap the entire 8 bytes >>> y = x[0].byteswap() >>> y.tostring().encode('hex') 'bf80000000000000' # ...and this doesn't round-trip >>> z = np.fromstring(y.tostring(), dtype='>c8') >>> assert z[0] == -1j Traceback (most recent call last): File "", line 1, in AssertionError >>> Any thoughts? Mike -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From gokhansever at gmail.com Wed Oct 7 13:14:30 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 7 Oct 2009 12:14:30 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <79D48A64-EAB1-429B-BB73-B0759D1BA060@gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com> <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com> <9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com> <49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com> <61E03928-9A65-449B-8785-E22E52F1C034@gmail.com> <49d6b3500910062357u7c831b3dvb6b4c6c2fd3cb3de@mail.gmail.com> <79D48A64-EAB1-429B-BB73-B0759D1BA060@gmail.com> Message-ID: <49d6b3500910071014n4ca7eed8j5865b46864befa6c@mail.gmail.com> Added as comment in the same entry: http://projects.scipy.org/numpy/ticket/1253#comment:1 Guessing that this one should be easy to fix :) On Wed, Oct 7, 2009 at 3:05 AM, Pierre GM wrote: > > On Oct 7, 2009, at 2:57 AM, G?khan Sever wrote: > > One more example. (I still think the behaviour of fill_value is > > inconsistent) > > Well, ma.masked_values use `value` to define fill_value, > ma.masked_equal does not. So yes, there's an inconsistency here. Once > again, please fill an enhancement request ticket. I should be able to > deal with this one quite soon. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Oct 7 13:20:32 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 07 Oct 2009 10:20:32 -0700 Subject: [Numpy-discussion] tostring() for array rows In-Reply-To: <1cd32cbb0910061347r5c12a09di2bd93a4310685045@mail.gmail.com> References: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com> <4ACBAB06.4000008@noaa.gov> <1cd32cbb0910061347r5c12a09di2bd93a4310685045@mail.gmail.com> Message-ID: <4ACCCDE0.4090002@noaa.gov> josef.pktd at gmail.com wrote: > I wanted to avoid the python loop and thought creating the view will be faster > with large arrays. But for this I need to know the memory length of a > row of arbitrary types for the conversion to strings, ndarray.itemsize might do it. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Oct 7 13:21:40 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 07 Oct 2009 10:21:40 -0700 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> Message-ID: <4ACCCE24.5000001@noaa.gov> G?khan Sever wrote: > > Sorry too much time spent in ipython -pylab :) > Good reflex. Saves you from making extra explanations. But it works with > just typing array why should I type np.array (Ohh my namespacess :) Because it shouldn't work that way! I use -pylab, but I've added: o.pylab_import_all = 0 to my ipy_user_conf.py file, so I don't get the namespace pollution. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From gokhansever at gmail.com Wed Oct 7 13:35:41 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 7 Oct 2009 12:35:41 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <4ACCCE24.5000001@noaa.gov> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <4ACCCE24.5000001@noaa.gov> Message-ID: <49d6b3500910071035l1ee82ea3n7a63f7ad7ad61832@mail.gmail.com> On Wed, Oct 7, 2009 at 12:21 PM, Christopher Barker wrote: > G?khan Sever wrote: > > > Sorry too much time spent in ipython -pylab :) > > > Good reflex. Saves you from making extra explanations. But it works with > > just typing array why should I type np.array (Ohh my namespacess :) > > Because it shouldn't work that way! I use -pylab, but I've added: > > o.pylab_import_all = 0 > > to my ipy_user_conf.py file, so I don't get the namespace pollution. > > -Chris > > Yes, I am aware of this fact. Still either from laziness or practicality I prefer typing plot to plt.plot and arange to np.arange while I have write them so many times in one day. Do you know what shortcut name is used for scipy package itself? > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Oct 7 13:38:44 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 7 Oct 2009 12:38:44 -0500 Subject: [Numpy-discussion] Questions about masked arrays In-Reply-To: <49d6b3500910071035l1ee82ea3n7a63f7ad7ad61832@mail.gmail.com> References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com> <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com> <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com> <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> <4ACCCE24.5000001@noaa.gov> <49d6b3500910071035l1ee82ea3n7a63f7ad7ad61832@mail.gmail.com> Message-ID: <3d375d730910071038x6485f3c5k114b0f30f2a7954f@mail.gmail.com> On Wed, Oct 7, 2009 at 12:35, G?khan Sever wrote: > Do you know what shortcut name is used for scipy package itself? I do not recommend using "import scipy" or "import scipy as ...". Import the subpackages directly (e.g. "from scipy import linalg"). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gokhansever at gmail.com Wed Oct 7 13:39:57 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 7 Oct 2009 12:39:57 -0500 Subject: [Numpy-discussion] Building a new copy of NumPy In-Reply-To: <3d375d730910070810x460f9f69j6711cca72db8cb09@mail.gmail.com> References: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com> <3d375d730910070810x460f9f69j6711cca72db8cb09@mail.gmail.com> Message-ID: <49d6b3500910071039j76649696p51ed7293ef9ffed2@mail.gmail.com> I have seen that message, but I wasn't sure these errors were directly connected since he mentions of getting segfaults whereas in my case only gives import errors. Building a new copy of scipy fixed this error. On Wed, Oct 7, 2009 at 10:10 AM, Robert Kern wrote: > On Wed, Oct 7, 2009 at 09:55, G?khan Sever wrote: > > Hello, > > > > I checked-out the latest trunk and make a new installation of NumPy. My > > question: Is it a known behaviour that this action will result with > > re-building other packages that are dependent on NumPy. In my case, I had > to > > re-built matplotlib, and now scipy. > > Known issue. See the thread "Numpy SVN broken". > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Oct 7 13:48:45 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Oct 2009 11:48:45 -0600 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com> Message-ID: On Wed, Oct 7, 2009 at 7:07 AM, Charles R Harris wrote: > > > On Wed, Oct 7, 2009 at 6:59 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, Oct 7, 2009 at 6:37 AM, David Cournapeau wrote: >> >>> On Wed, Oct 7, 2009 at 9:31 PM, Charles R Harris >>> wrote: >>> > >>> > >>> > On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau >>> wrote: >>> >> >>> >> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris >>> >> wrote: >>> >> > >>> >> > >>> >> > Looks like a clue ;) >>> >> >>> >> Ok, I fixed it here: >>> >> >>> >> http://github.com/cournape/numpy/tree/fix_abi >>> >> >>> >> But that's an ugly hack. I think we should consider rewriting how we >>> >> generate the API: instead of automatically growing the API array of >>> >> fptr, we should explicitly mark which function name has which index, >>> >> and hardcode it. It would help quite a bit to avoid changing the ABI >>> >> unvoluntary. >>> >> >>> > >>> > I'm thinking the safest thing to do is to move the new type to the end >>> of >>> > the list. >>> >>> That's what the above branch does. >>> >>> > I'm not sure what all the ramifications are for compatibility to >>> > having it stuck in the middle like that, does it change the type >>> numbers for >>> > all the types after? >>> >>> Yes, there is no space left between the types declarations and the >>> first functions. Currently, I just put things at the end manually, but >>> that's really error prone. >>> >>> I am a bit lazy to fix this for real (I was thinking about using a >>> python dict with hardcoded indexes as an entry instead of the current >>> .txt files, but this requires several changes in the code generator, >>> which is already not the greatest code to begin with). >>> >>> >> What I'm concerned about is that, IIRC, types in the c-code can be >> referenced by their index in a list of types and that internal mechanism >> might be exposed to the outside somewhere. That is, what has happened to the >> order of the enumerated types? If that has changed, and if external code >> references a type by a hard-wired number, then there is a problem that goes >> beyond the code generator. The safe(r) thing to do in that case is add the >> new type to the end of the enumerated types and fix the promotion code so it >> doesn't try to rely on a linear order. >> >> > Here, for instance: > > "The various character codes indicating certain types are also part of an > enumerated > list. References to type characters (should they be needed at all) should > always use > these enumerations. The form of them is NPY LTR where " > > So those macros will generate a hard-coded number at compile time, and > number that might have changed with the addition of the new types. > > Nevermind, it looks like the new type number is at the end as it should be. In [22]: typecodes Out[22]: {'All': '?bhilqpBHILQPfdgFDGSUVOMm', 'AllFloat': 'fdgFDG', 'AllInteger': 'bBhHiIlLqQpP', 'Character': 'c', 'Complex': 'FDG', 'Datetime': 'Mm', 'Float': 'fdg', 'Integer': 'bhilqp', 'UnsignedInteger': 'BHILQP'} Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Oct 7 15:14:58 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 07 Oct 2009 12:14:58 -0700 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> <4ACBAC0E.3070708@noaa.gov> Message-ID: <4ACCE8B2.1050306@noaa.gov> Pierre GM wrote: > On Oct 6, 2009, at 10:08 PM, Bruce Southey wrote: >> option to merge delimiters - actually in SAS it is default Wow! that sure strikes me as a bad choice. > Ahah! I get it. Well, I remember that we discussed something like that a > few months ago when I started working on np.genfromtxt, and the > default of *not* merging whitespaces was requested. I gonna check > whether we can't put this option somewhere now... I'd think you might want to have two options: either "whitespace" which would be any type or amount of whitespace, or a specific delimeter: say "\t" or " " or " " (two spaces), etc. In that case, it would mean "one and only one of these". Of course, this would fail in Bruce's example: >>>> A B C D >>>> 1 2 3 4 >>>> 1 4 5 as there is a space for the delimeter, and one for the data! This looks like fixed-format to me. if it were single-space delimited, it would look more like: when the delimiter is whitespace. A B C D E 1 2 3 4 5 1 4 5 which is the same as: A, B, C, D, E 1, 2, 3, 4, 5 1, , , 4, 5 If something like SAS actually does merge decimeters, which I interpret to mean that if there are a few empty fields and you call for tab-delimited , you only get one tab, then information as simply been lost -- there is no way to recover it! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From bsouthey at gmail.com Wed Oct 7 15:54:51 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 07 Oct 2009 14:54:51 -0500 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: <4ACCE8B2.1050306@noaa.gov> References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> <4ACBAC0E.3070708@noaa.gov> <4ACCE8B2.1050306@noaa.gov> Message-ID: <4ACCF20B.5040901@gmail.com> On 10/07/2009 02:14 PM, Christopher Barker wrote: > Pierre GM wrote: > >> On Oct 6, 2009, at 10:08 PM, Bruce Southey wrote: >> >>> option to merge delimiters - actually in SAS it is default >>> > Wow! that sure strikes me as a bad choice. > > >> Ahah! I get it. Well, I remember that we discussed something like that a >> few months ago when I started working on np.genfromtxt, and the >> default of *not* merging whitespaces was requested. I gonna check >> whether we can't put this option somewhere now... >> > I'd think you might want to have two options: either "whitespace" which > would be any type or amount of whitespace, or a specific delimeter: say > "\t" or " " or " " (two spaces), etc. In that case, it would mean "one > and only one of these". > > Of course, this would fail in Bruce's example: > > >>>> A B C D > >>>> 1 2 3 4 > >>>> 1 4 5 > > as there is a space for the delimeter, and one for the data! This looks > like fixed-format to me. if it were single-space delimited, it would > look more like: > > when the delimiter is whitespace. > A B C D E > 1 2 3 4 5 > 1 4 5 > > which is the same as: > > A, B, C, D, E > 1, 2, 3, 4, 5 > 1, , , 4, 5 > > > If something like SAS actually does merge decimeters, which I interpret > to mean that if there are a few empty fields and you call for > tab-delimited , you only get one tab, then information as simply been > lost -- there is no way to recover it! > > -Chris > > To use fixed length fields you really need nicely formatted data and I usually do not have that. As a default it does not always work for non-whitespace delimiters such as: A,B,C ,,1 1,2,3 There is an option to override that behavior. But it is very useful when you have extra whitespace especially reading in text strings that have different lengths or different levels of whitespace padding. The following is correct in that Python does merge whitespace delimiters by default. This is also what SAS does by default for any delimiter. But it is incorrect if each whitespace character is a delimiter: s = StringIO(''' 1 10 100\r\n 10 1 1000''') np.genfromtxt(s) array([[ 1., 10., 100.], [ 10., 1., 1000.]]) np.genfromtxt(s, delimiter=' ') Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python2.6/site-packages/numpy/lib/io.py", line 1048, in genfromtxt raise IOError('End-of-file reached before encountering data.') IOError: End-of-file reached before encountering data. Anyhow, I do like what genfromtxt is doing so merging multiple delimiters of the same type is not really needed. Bruce From pgmdevlist at gmail.com Wed Oct 7 16:16:23 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 7 Oct 2009 16:16:23 -0400 Subject: [Numpy-discussion] genfromtxt - the return In-Reply-To: <4ACCF20B.5040901@gmail.com> References: <4ACB8FB3.5040706@gmail.com> <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> <4ACBAC0E.3070708@noaa.gov> <4ACCE8B2.1050306@noaa.gov> <4ACCF20B.5040901@gmail.com> Message-ID: On Oct 7, 2009, at 3:54 PM, Bruce Southey wrote: > > Anyhow, I do like what genfromtxt is doing so merging multiple > delimiters of the same type is not really needed. Thinking about it, merging multiple delimiters of the same type can be tricky: how do you distinguish between, say, "AAA\t\tCCC" where you expect 2 fields and "AAA\t\tCCC" where you expect 3 fields but the second one is missing ? I think 'genfromtxt' works consistently right now (but of course, as soon as I say that we'll find some counter-examples), so let's not break it. Yet. From stefan at sun.ac.za Wed Oct 7 18:35:07 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 8 Oct 2009 00:35:07 +0200 Subject: [Numpy-discussion] Building a new copy of NumPy In-Reply-To: <49d6b3500910071039j76649696p51ed7293ef9ffed2@mail.gmail.com> References: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com> <3d375d730910070810x460f9f69j6711cca72db8cb09@mail.gmail.com> <49d6b3500910071039j76649696p51ed7293ef9ffed2@mail.gmail.com> Message-ID: <9457e7c80910071535k697315begd3d839d86ee2dcfc@mail.gmail.com> You can pull the patches from David's fix_abi branch: http://github.com/cournape/numpy/tree/fix_abi This branch has been hacked to be ABI compatible with previous versions. Cheers St?fan 2009/10/7 G?khan Sever : > I have seen that message, but I wasn't sure these errors were directly > connected since he mentions of getting segfaults whereas in my case only > gives import errors. Building a new copy of scipy fixed this error. From oliphant at enthought.com Wed Oct 7 22:39:33 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Wed, 7 Oct 2009 21:39:33 -0500 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> Message-ID: <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> On Oct 7, 2009, at 3:06 AM, David Cournapeau wrote: > On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris > wrote: >> >> >> Looks like a clue ;) > > Ok, I fixed it here: > > http://github.com/cournape/numpy/tree/fix_abi > > But that's an ugly hack. I think we should consider rewriting how we > generate the API: instead of automatically growing the API array of > fptr, we should explicitly mark which function name has which index, > and hardcode it. It would help quite a bit to avoid changing the ABI > unvoluntary. I apologize for the mis communication that has occurred here. I did not understand that there was a desire to keep ABI compatibility with NumPy 1.3 when NumPy 1.4 was released. The datetime merge was made under that presumption. I had assumed that people would be fine with recompilation of extension modules that depend on the NumPy C-API. There are several things that needed to be done to merge in new fundamental data-types. Why don't we call the next release NumPy 2.0 if that helps things? Personally, I'd prefer that over hacks to keep ABI compatibility. It feels like we are working very hard to track ABI issues that can also be handled with dependency checking and good package management. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Wed Oct 7 22:51:11 2009 From: cournape at gmail.com (David Cournapeau) Date: Thu, 8 Oct 2009 11:51:11 +0900 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> Message-ID: <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> On Thu, Oct 8, 2009 at 11:39 AM, Travis Oliphant wrote: > > I apologize for the mis communication that has occurred here. No problem >? I did not > understand that there was a desire to keep ABI compatibility with NumPy 1.3 > when NumPy 1.4 was released. ? ?The datetime merge was made under that > presumption. > I had assumed that people would be fine with recompilation of extension > modules that depend on the NumPy C-API. ? ?There are several things that > needed to be done to merge in new fundamental data-types. > Why don't we call the next release NumPy 2.0 if that helps things? > ?Personally, I'd prefer that over hacks to keep ABI compatibility. Keeping ABI compatibility by itself is not an hack - the current workaround is an hack, but that's only because the current way of doing things in code generator is a bit ugly, and I did not want to spend too much time on it. It is purely an implementation issue, the fundamental idea is straightforward. If you want a cleaner solution, I can work on it. I think the hour or so that it would take is worth it compared to breaking many people's code. > ? It > feels like we are working very hard to track ABI issues that can also be > handled with dependency checking and good package management. I think ABI issues are mostly orthogonal to versioning - generally, versions are related to API changes (API changes is what should drive ABI changes, at least for projects like numpy). I would prefer passing to "numpy 2.0" when we really need to break ABI and API - at that point, I think we should also think hard about changing our structures and all to make them more robust to those changes (using pimp-like strategies in particular). David From cournape at gmail.com Wed Oct 7 22:55:28 2009 From: cournape at gmail.com (David Cournapeau) Date: Thu, 8 Oct 2009 11:55:28 +0900 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> Message-ID: <5b8d13220910071955r3ae089c6x2b80bdfab7de6fca@mail.gmail.com> On Thu, Oct 8, 2009 at 11:51 AM, David Cournapeau wrote: > I would prefer passing to "numpy 2.0" when we really need to break ABI > and API - at that point, I think we should also think hard about > changing our structures and all to make them more robust to those > changes (using pimp-like strategies in particular). Sorry, I mean pimple, not pimp (makes you wonder what goes in my head): David From robert.kern at gmail.com Wed Oct 7 22:57:44 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 7 Oct 2009 21:57:44 -0500 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910071955r3ae089c6x2b80bdfab7de6fca@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> <5b8d13220910071955r3ae089c6x2b80bdfab7de6fca@mail.gmail.com> Message-ID: <3d375d730910071957u5ea0d6f9o40fde161cde3590@mail.gmail.com> On Wed, Oct 7, 2009 at 21:55, David Cournapeau wrote: > On Thu, Oct 8, 2009 at 11:51 AM, David Cournapeau wrote: > >> I would prefer passing to "numpy 2.0" when we really need to break ABI >> and API - at that point, I think we should also think hard about >> changing our structures and all to make them more robust to those >> changes (using pimp-like strategies in particular). > > Sorry, I mean pimple, not pimp (makes you wonder what goes in my head): Indeed! (And it's "pimpl".) :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From aisaac at american.edu Wed Oct 7 23:04:28 2009 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 07 Oct 2009 23:04:28 -0400 Subject: [Numpy-discussion] robustness strategies In-Reply-To: <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> Message-ID: <4ACD56BC.9080302@american.edu> On 10/7/2009 10:51 PM, David Cournapeau wrote: > pimp-like strategies Which means ... ? Alan From robert.kern at gmail.com Wed Oct 7 23:08:33 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 7 Oct 2009 22:08:33 -0500 Subject: [Numpy-discussion] robustness strategies In-Reply-To: <4ACD56BC.9080302@american.edu> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> <4ACD56BC.9080302@american.edu> Message-ID: <3d375d730910072008r7a22f661hbee943a57c085839@mail.gmail.com> On Wed, Oct 7, 2009 at 22:04, Alan G Isaac wrote: > On 10/7/2009 10:51 PM, David Cournapeau wrote: >> pimp-like strategies > > > Which means ... ? He meant "pimpl-like". http://en.wikipedia.org/wiki/Opaque_pointer -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From aisaac at american.edu Wed Oct 7 23:09:08 2009 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 07 Oct 2009 23:09:08 -0400 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <3d375d730910071957u5ea0d6f9o40fde161cde3590@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> <5b8d13220910071955r3ae089c6x2b80bdfab7de6fca@mail.gmail.com> <3d375d730910071957u5ea0d6f9o40fde161cde3590@mail.gmail.com> Message-ID: <4ACD57D4.7020607@american.edu> On 10/7/2009 10:57 PM, Robert Kern wrote: > it's "pimpl" OK: http://en.wikipedia.org/wiki/Opaque_pointer Thanks, Alan Isaac From david at ar.media.kyoto-u.ac.jp Wed Oct 7 23:08:42 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 08 Oct 2009 12:08:42 +0900 Subject: [Numpy-discussion] robustness strategies In-Reply-To: <4ACD56BC.9080302@american.edu> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> <4ACD56BC.9080302@american.edu> Message-ID: <4ACD57BA.4010607@ar.media.kyoto-u.ac.jp> Alan G Isaac wrote: > On 10/7/2009 10:51 PM, David Cournapeau wrote: > >> pimp-like strategies >> > > > Which means ... ? > The idea is to put one pointer in you struct instead of all members - it is a form of encapsulation, and it is enforced at compile time. I think part of the problem with changing API/ABI in numpy is that the headers show way too much information. I would really like to improve this, but this would clearly break the ABI (and API - a lot of macros would have to go). There is a performance cost of one more indirection (if you have a pointer to a struct, you need to dereference both the struct and the D pointer inside), but for most purpose, that's likely to be negligeable, except for a few special cases (like iterators). cheers, David From charlesr.harris at gmail.com Thu Oct 8 00:01:05 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Oct 2009 22:01:05 -0600 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> Message-ID: On Wed, Oct 7, 2009 at 8:39 PM, Travis Oliphant wrote: > > On Oct 7, 2009, at 3:06 AM, David Cournapeau wrote: > > On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris > wrote: > > > > Looks like a clue ;) > > > Ok, I fixed it here: > > http://github.com/cournape/numpy/tree/fix_abi > > But that's an ugly hack. I think we should consider rewriting how we > generate the API: instead of automatically growing the API array of > fptr, we should explicitly mark which function name has which index, > and hardcode it. It would help quite a bit to avoid changing the ABI > unvoluntary. > > > I apologize for the mis communication that has occurred here. I did not > understand that there was a desire to keep ABI compatibility with NumPy 1.3 > when NumPy 1.4 was released. The datetime merge was made under that > presumption. > > I had assumed that people would be fine with recompilation of extension > modules that depend on the NumPy C-API. There are several things that > needed to be done to merge in new fundamental data-types. > > Why don't we call the next release NumPy 2.0 if that helps things? > Personally, I'd prefer that over hacks to keep ABI compatibility. It > feels like we are working very hard to track ABI issues that can also be > handled with dependency checking and good package management. > > I was that the code generator shifted the API order because it was inserting the new types after the old types but before the other API functions. It's a code generator problem and doesn't call for a jump in major version. We hope ;) I think David's hack, which looks to have been committed by Stefan, should fix things up. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Thu Oct 8 02:37:01 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 8 Oct 2009 08:37:01 +0200 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> Message-ID: <9457e7c80910072337n37399ad2l2f16b47d18b3ca98@mail.gmail.com> 2009/10/8 Charles R Harris : > code generator problem and doesn't call for a jump in major version. We hope > ;) I think David's hack, which looks to have been committed by Stefan, > should fix things up. I accidentally committed some of David's patches, but I reverted them back out. I think David's idea of generating an API from dictionary is much cleaner. We can work on implementing that today. Cheers St?fan From david at ar.media.kyoto-u.ac.jp Thu Oct 8 02:48:47 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 08 Oct 2009 15:48:47 +0900 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <9457e7c80910072337n37399ad2l2f16b47d18b3ca98@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <9457e7c80910072337n37399ad2l2f16b47d18b3ca98@mail.gmail.com> Message-ID: <4ACD8B4F.8000303@ar.media.kyoto-u.ac.jp> St?fan van der Walt wrote: > We can work on implementing that today. > I am working on it ATM - it is taking me longer than expected, though. David From oliphant at enthought.com Thu Oct 8 07:55:10 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 8 Oct 2009 06:55:10 -0500 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> Message-ID: <41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com> On Oct 7, 2009, at 9:51 PM, David Cournapeau wrote: > On Thu, Oct 8, 2009 at 11:39 AM, Travis Oliphant > wrote: >> >> I apologize for the mis communication that has occurred here. > > No problem > >> I did not >> understand that there was a desire to keep ABI compatibility with >> NumPy 1.3 >> when NumPy 1.4 was released. The datetime merge was made under >> that >> presumption. >> I had assumed that people would be fine with recompilation of >> extension >> modules that depend on the NumPy C-API. There are several things >> that >> needed to be done to merge in new fundamental data-types. >> Why don't we call the next release NumPy 2.0 if that helps things? >> Personally, I'd prefer that over hacks to keep ABI compatibility. > > Keeping ABI compatibility by itself is not an hack - the current > workaround is an hack, but that's only because the current way of > doing things in code generator is a bit ugly, and I did not want to > spend too much time on it. It is purely an implementation issue, the > fundamental idea is straightforward. > > If you want a cleaner solution, I can work on it. I think the hour or > so that it would take is worth it compared to breaking many people's > code. If that's all it would take, then definitely go for it. I'm not sure "breaking people's code" is the right image, though. It's more like "forcing people to upgrade" to take advantage of new features. Improvements to the encapsulation of the numpy C-API are definitely welcome. They have come a long way from their beginnings in Numeric already due to the efforts of you and David Cooke (and I'm sure others I'm not as aware of). The problem I have with spending time on it though is that there is still more implementation work to finish on the datetime functionality to complete the NEP implementation. Naturally, I'd like to see those improvements made first. But, time-spent is usually a function of how much time it takes to "get-in" to the code, so I won't try to distract you if you have a clear idea about how to proceed. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Thu Oct 8 08:09:44 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 8 Oct 2009 07:09:44 -0500 Subject: [Numpy-discussion] byteswapping a complex scalar In-Reply-To: <4ACCB3BB.8010805@stsci.edu> References: <4ACCB3BB.8010805@stsci.edu> Message-ID: <26A93513-988C-4EDB-A717-326BC01DE2EB@enthought.com> On Oct 7, 2009, at 10:28 AM, Michael Droettboom wrote: > I'm noticing an inconsistency as to how complex numbers are > byteswapped > as arrays vs. scalars, and wondering if I'm doing something wrong. > >>>> x = np.array([-1j], '>>> x.tostring().encode('hex') > '00000000000080bf' > # This is a little-endian representation, in the order (real, imag) > > # When I swap the whole array, it swaps each of the (real, imag) parts > separately >>>> y = x.byteswap() >>>> y.tostring().encode('hex') > '00000000bf800000' > # and this round-trips fine >>>> z = np.fromstring(y.tostring(), dtype='>c8') >>>> assert z[0] == -1j >>>> > > # When I swap the scalar, it seems to swap the entire 8 bytes >>>> y = x[0].byteswap() >>>> y.tostring().encode('hex') > 'bf80000000000000' > # ...and this doesn't round-trip >>>> z = np.fromstring(y.tostring(), dtype='>c8') >>>> assert z[0] == -1j > Traceback (most recent call last): > File "", line 1, in > AssertionError >>>> > > Any thoughts? I think this is a bug. You should file a ticket and mark it critical. As I look at the scalar implementation (in gentype_byteswap in scalartypes.c.src), it looks like it's basing it just on the size (Hmm.... I don't know why it's not using the copyswap in the descr field....). This works for many types, but not complex numbers which should have real and imaginary parts handled separately. There are two ways to fix this that I can see: 1) fix the gentype implementation to use the copyswap function pointer from the datatype object 2) over-ride the byteswap in the complex scalar Python type (there is a base-class complex scalar type where it could be placed) to do the right thing. I would probably do #1 if I get a chance to work on it (because strings shouldn't be byteswapped either and they currently are, I see...) x = np.array(['abcd']) Compare: x.byteswap()[0] x[0].byteswap() The work around is to byteswap before extraction: x.byteswap()[0] Thanks for the bug-report. -Travis From oliphant at enthought.com Thu Oct 8 08:19:14 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 8 Oct 2009 07:19:14 -0500 Subject: [Numpy-discussion] byteswapping a complex scalar In-Reply-To: <4ACCB3BB.8010805@stsci.edu> References: <4ACCB3BB.8010805@stsci.edu> Message-ID: On Oct 7, 2009, at 10:28 AM, Michael Droettboom wrote: > I'm noticing an inconsistency as to how complex numbers are > byteswapped > as arrays vs. scalars, and wondering if I'm doing something wrong. > >>>> x = np.array([-1j], '>>> x.tostring().encode('hex') > '00000000000080bf' > # This is a little-endian representation, in the order (real, imag) > > # When I swap the whole array, it swaps each of the (real, imag) parts > separately >>>> y = x.byteswap() >>>> y.tostring().encode('hex') > '00000000bf800000' > # and this round-trips fine >>>> z = np.fromstring(y.tostring(), dtype='>c8') >>>> assert z[0] == -1j >>>> > > # When I swap the scalar, it seems to swap the entire 8 bytes >>>> y = x[0].byteswap() >>>> y.tostring().encode('hex') > 'bf80000000000000' > # ...and this doesn't round-trip >>>> z = np.fromstring(y.tostring(), dtype='>c8') >>>> assert z[0] == -1j > Traceback (most recent call last): > File "", line 1, in > AssertionError >>>> > > Any thoughts? I just checked a fix for this into SVN (tests still need to be added though...) I can't currently build SVN on my Mac for some reason (I don't know if it has to do with recent changes or not, but I don't have time to track it down right now....the error I'm getting is something about Datetime array scalar types not being defined which seems related to the work Dave and Stefan have been discussing). It's a small change, though, and should work. -Travis From oliphant at enthought.com Thu Oct 8 08:25:26 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 8 Oct 2009 07:25:26 -0500 Subject: [Numpy-discussion] byteswapping a complex scalar In-Reply-To: References: <4ACCB3BB.8010805@stsci.edu> Message-ID: <295EECAD-5381-491E-8641-9DF53683FB70@enthought.com> On Oct 8, 2009, at 7:19 AM, Travis Oliphant wrote: > > I just checked a fix for this into SVN (tests still need to be added > though...) > > I can't currently build SVN on my Mac for some reason (I don't know if > it has to do with recent changes or not, but I don't have time to > track it down right now....the error I'm getting is something about > Datetime array scalar types not being defined which seems related to > the work Dave and Stefan have been discussing). I can build from SVN. The problem is I had to check-out again from SVN (and get rid of the old code-generated files --- sure would be nice if there were the equivalent of "make clean" -Travis From cournape at gmail.com Thu Oct 8 09:47:21 2009 From: cournape at gmail.com (David Cournapeau) Date: Thu, 8 Oct 2009 22:47:21 +0900 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> <41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com> Message-ID: <5b8d13220910080647r731dba31t62e3ff2f5212af50@mail.gmail.com> On Thu, Oct 8, 2009 at 8:55 PM, Travis Oliphant wrote: > > On Oct 7, 2009, at 9:51 PM, David Cournapeau wrote: > > On Thu, Oct 8, 2009 at 11:39 AM, Travis Oliphant > wrote: > > I apologize for the mis communication that has occurred here. > > No problem > > ? I did not > > understand that there was a desire to keep ABI compatibility with NumPy 1.3 > > when NumPy 1.4 was released. ? ?The datetime merge was made under that > > presumption. > > I had assumed that people would be fine with recompilation of extension > > modules that depend on the NumPy C-API. ? ?There are several things that > > needed to be done to merge in new fundamental data-types. > > Why don't we call the next release NumPy 2.0 if that helps things? > > ?Personally, I'd prefer that over hacks to keep ABI compatibility. > > Keeping ABI compatibility by itself is not an hack - the current > workaround is an hack, but that's only because the current way of > doing things in code generator is a bit ugly, and I did not want to > spend too much time on it. It is purely an implementation issue, the > fundamental idea is straightforward. > > If you want a cleaner solution, I can work on it. I think the hour or > so that it would take is worth it compared to breaking many people's > code. > > If that's all it would take, then definitely go for it. ? ?I'm not sure > "breaking people's code" is the right image, though. ? It's more like > "forcing people to upgrade" to take advantage of new features. We got several people complaining about segfaults and the like - granted, those could have been avoided by updating the ABI accordingly. > The problem I have with spending time on it though is that there is still > more implementation work to finish on the datetime functionality to complete > the NEP implementation. ? ? ?Naturally, I'd like to see those improvements > made first. ?But, time-spent is usually a function of how much time it takes > to "get-in" to the code, so I won't try to distract you if you have a clear > idea about how to proceed. I am applying my changes as we speak - it took me much more time than I wished because I tried hard to make sure the ABI was not changed. But at least, the current scheme should be much more robust: the ordering is fixed at one single place, and there are a few checks which ensure we don't screw things up (by putting 'holes' in the api array, or by using twice the same index). cheers, David From cournape at gmail.com Thu Oct 8 11:01:37 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 9 Oct 2009 00:01:37 +0900 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> <41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com> Message-ID: <5b8d13220910080801m41ada777nd74fbb6cfc921070@mail.gmail.com> On Thu, Oct 8, 2009 at 8:55 PM, Travis Oliphant wrote: > > The problem I have with spending time on it though is that there is still > more implementation work to finish on the datetime functionality to complete > the NEP implementation. ? ? ?Naturally, I'd like to see those improvements > made first. ?But, time-spent is usually a function of how much time it takes > to "get-in" to the code, so I won't try to distract you if you have a clear > idea about how to proceed. Would it be possible to include next changes in small self-contained commits ? It really makes the review easier to follow for me, and tracking regressions is easier as well. Git-svn makes this easy. David From mdroe at stsci.edu Thu Oct 8 13:08:42 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Thu, 08 Oct 2009 13:08:42 -0400 Subject: [Numpy-discussion] byteswapping a complex scalar In-Reply-To: References: <4ACCB3BB.8010805@stsci.edu> Message-ID: <4ACE1C9A.6030001@stsci.edu> Thanks! I guess I won't file a bug then ;) Mike Travis Oliphant wrote: > On Oct 7, 2009, at 10:28 AM, Michael Droettboom wrote: > > >> I'm noticing an inconsistency as to how complex numbers are >> byteswapped >> as arrays vs. scalars, and wondering if I'm doing something wrong. >> >> >>>>> x = np.array([-1j], '>>>> x.tostring().encode('hex') >>>>> >> '00000000000080bf' >> # This is a little-endian representation, in the order (real, imag) >> >> # When I swap the whole array, it swaps each of the (real, imag) parts >> separately >> >>>>> y = x.byteswap() >>>>> y.tostring().encode('hex') >>>>> >> '00000000bf800000' >> # and this round-trips fine >> >>>>> z = np.fromstring(y.tostring(), dtype='>c8') >>>>> assert z[0] == -1j >>>>> >>>>> >> # When I swap the scalar, it seems to swap the entire 8 bytes >> >>>>> y = x[0].byteswap() >>>>> y.tostring().encode('hex') >>>>> >> 'bf80000000000000' >> # ...and this doesn't round-trip >> >>>>> z = np.fromstring(y.tostring(), dtype='>c8') >>>>> assert z[0] == -1j >>>>> >> Traceback (most recent call last): >> File "", line 1, in >> AssertionError >> >> Any thoughts? >> > > > I just checked a fix for this into SVN (tests still need to be added > though...) > > I can't currently build SVN on my Mac for some reason (I don't know if > it has to do with recent changes or not, but I don't have time to > track it down right now....the error I'm getting is something about > Datetime array scalar types not being defined which seems related to > the work Dave and Stefan have been discussing). > > It's a small change, though, and should work. > > -Travis > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From oliphant at enthought.com Thu Oct 8 18:01:53 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 8 Oct 2009 17:01:53 -0500 Subject: [Numpy-discussion] NumPy SVN broken In-Reply-To: <5b8d13220910080801m41ada777nd74fbb6cfc921070@mail.gmail.com> References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com> <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> <41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com> <5b8d13220910080801m41ada777nd74fbb6cfc921070@mail.gmail.com> Message-ID: On Oct 8, 2009, at 10:01 AM, David Cournapeau wrote: > On Thu, Oct 8, 2009 at 8:55 PM, Travis Oliphant > wrote: >> >> The problem I have with spending time on it though is that there is >> still >> more implementation work to finish on the datetime functionality to >> complete >> the NEP implementation. Naturally, I'd like to see those >> improvements >> made first. But, time-spent is usually a function of how much time >> it takes >> to "get-in" to the code, so I won't try to distract you if you have >> a clear >> idea about how to proceed. > > Would it be possible to include next changes in small self-contained > commits ? It really makes the review easier to follow for me, and > tracking regressions is easier as well. Git-svn makes this easy. > That was the reason for merging to the trunk rather than continuing to work in the branch. I expect that the next changes will be more incremental. -Travis From oliphant at enthought.com Thu Oct 8 18:02:45 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 8 Oct 2009 17:02:45 -0500 Subject: [Numpy-discussion] byteswapping a complex scalar In-Reply-To: <4ACE1C9A.6030001@stsci.edu> References: <4ACCB3BB.8010805@stsci.edu> <4ACE1C9A.6030001@stsci.edu> Message-ID: On Oct 8, 2009, at 12:08 PM, Michael Droettboom wrote: > Thanks! I guess I won't file a bug then ;) Probably still should, actually: Until the tests get committed, the bug is not really "fixed" -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwf at cs.toronto.edu Thu Oct 8 18:28:32 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 8 Oct 2009 18:28:32 -0400 Subject: [Numpy-discussion] PyArray_FROM_OF from Cython Message-ID: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu> I'm trying to use PyArray_FROM_OF from Cython and the generated C code keeps crashing. Dag said on the Cython list that he wasn't sure what was going on, so maybe someone here will have an idea. The line that gdb says is crashing is: #0 0x00e48287 in __pyx_pf_3_vq_vq (__pyx_self=0x0, __pyx_args=0xca2d8, __pyx_kwds=0x0) at _vq_rewrite.c:1025 1025 __pyx_t_1 = PyArray_FROM_OF(((PyObject *)__pyx_v_obs), __pyx_v_flags); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 90; __pyx_clineno = __LINE__; goto __pyx_L1_error;} obs and obs_a are both cdef'd np.ndarrays, and the former (obs) is passed in as an argument. I define flags as cdef int flags = np.NPY_CONTIGUOUS | np.NPY_ALIGNED | np.NPY_NOTSWAPPED and then the line that crashes is obs_a = np.PyArray_FROM_OF(obs, flags) Does anyone know what I'm doing wrong? (I know I could use np.ascontiguous, but as far as I can tell this _should_ work) David From robert.kern at gmail.com Thu Oct 8 18:47:31 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 8 Oct 2009 17:47:31 -0500 Subject: [Numpy-discussion] PyArray_FROM_OF from Cython In-Reply-To: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu> References: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu> Message-ID: <3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com> On Thu, Oct 8, 2009 at 17:28, David Warde-Farley wrote: > I'm trying to use PyArray_FROM_OF from Cython and the generated C code > keeps crashing. ?Dag said on the Cython list that he wasn't sure what > was going on, so maybe someone here will have an idea. You must call import_array() at the top level before you can use any numpy C API functions. http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dwf at cs.toronto.edu Thu Oct 8 20:32:14 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 8 Oct 2009 20:32:14 -0400 Subject: [Numpy-discussion] PyArray_FROM_OF from Cython In-Reply-To: <3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com> References: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu> <3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com> Message-ID: <354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu> On 8-Oct-09, at 6:47 PM, Robert Kern wrote: > On Thu, Oct 8, 2009 at 17:28, David Warde-Farley > wrote: >> I'm trying to use PyArray_FROM_OF from Cython and the generated C >> code >> keeps crashing. Dag said on the Cython list that he wasn't sure what >> was going on, so maybe someone here will have an idea. > > You must call import_array() at the top level before you can use any > numpy C API functions. > > http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI Thanks. One more thing: calling Py_DECREF on arrays that I have acquired from PyArray_FROM_OF seems to cause crashes, am I correct in assuming that Cython is somehow tracking all the PyObjects in the scope (even ones acquired via the NumPy C API) and DECREF'ing it for me? David From robert.kern at gmail.com Thu Oct 8 20:59:37 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 8 Oct 2009 19:59:37 -0500 Subject: [Numpy-discussion] PyArray_FROM_OF from Cython In-Reply-To: <354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu> References: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu> <3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com> <354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu> Message-ID: <3d375d730910081759j46fb68eeic1748d5c5c44862@mail.gmail.com> On Thu, Oct 8, 2009 at 19:32, David Warde-Farley wrote: > > On 8-Oct-09, at 6:47 PM, Robert Kern wrote: > >> On Thu, Oct 8, 2009 at 17:28, David Warde-Farley >> wrote: >>> I'm trying to use PyArray_FROM_OF from Cython and the generated C >>> code >>> keeps crashing. ?Dag said on the Cython list that he wasn't sure what >>> was going on, so maybe someone here will have an idea. >> >> You must call import_array() at the top level before you can use any >> numpy C API functions. >> >> http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI > > Thanks. One more thing: calling Py_DECREF on arrays that I have > acquired from PyArray_FROM_OF seems to cause crashes, am I correct in > assuming that Cython is somehow tracking all the PyObjects in the > scope (even ones acquired via the NumPy C API) and DECREF'ing it for me? It usually does, yes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Fri Oct 9 01:13:44 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 9 Oct 2009 14:13:44 +0900 Subject: [Numpy-discussion] byteswapping a complex scalar In-Reply-To: <295EECAD-5381-491E-8641-9DF53683FB70@enthought.com> References: <4ACCB3BB.8010805@stsci.edu> <295EECAD-5381-491E-8641-9DF53683FB70@enthought.com> Message-ID: <5b8d13220910082213q759d0e74xaa53e232735497c4@mail.gmail.com> On Thu, Oct 8, 2009 at 9:25 PM, Travis Oliphant wrote: > > On Oct 8, 2009, at 7:19 AM, Travis Oliphant wrote: > >> >> I just checked a fix for this into SVN (tests still need to be added >> though...) >> >> I can't currently build SVN on my Mac for some reason (I don't know if >> it has to do with recent changes or not, but I don't have time to >> track it down right now....the error I'm getting is something about >> Datetime array scalar types not being defined which seems related to >> the work Dave and Stefan have been discussing). > > I can build from SVN. ?The problem is I had to check-out again from > SVN (and get rid of the old code-generated files --- sure would be > nice if there were the equivalent of "make clean" Note that git clean will clean your working tree, and numscons generates less junk in the working tree. It can be a time saver and quite convenient, David From david at ar.media.kyoto-u.ac.jp Fri Oct 9 00:56:28 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 09 Oct 2009 13:56:28 +0900 Subject: [Numpy-discussion] [review] Easy win to improve numpy import times by 30 % Message-ID: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp> Hi, This branch improves numpy import times quite significantly on slow machines: http://github.com/cournape/numpy/tree/noinspect One of the main culprit is ma, because of inspect (inspect is extremely slow to import; as a data point, python -c "import inspect" takes 67 ms vs python -c "" taking 22 ms, and python -c "import numpy" taking 158 ms on my machine). Since inspect is used in quite a few places, and that we only use it to extract arguments from a function, I added a small numpy.lib.inspect module, and change the import in numpy.ma. I copied the inspect module of python 2.4.4 to ensure maximum compatibility. This speed up the import times from 158 ms to 108 ms ~ 30 % speed improvement. On recent machines, the speedup is less impressive, but still in the 20 % range. I think it largely worths it, and will integrate this unless someone is strongly against it or see a problem with the approach, cheers, David From dagss at student.matnat.uio.no Fri Oct 9 04:47:43 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 09 Oct 2009 10:47:43 +0200 Subject: [Numpy-discussion] PyArray_FROM_OF from Cython In-Reply-To: <354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu> References: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu> <3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com> <354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu> Message-ID: <4ACEF8AF.6060709@student.matnat.uio.no> David Warde-Farley wrote: > On 8-Oct-09, at 6:47 PM, Robert Kern wrote: > >> On Thu, Oct 8, 2009 at 17:28, David Warde-Farley >> wrote: >>> I'm trying to use PyArray_FROM_OF from Cython and the generated C >>> code >>> keeps crashing. Dag said on the Cython list that he wasn't sure what >>> was going on, so maybe someone here will have an idea. >> You must call import_array() at the top level before you can use any >> numpy C API functions. >> >> http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI > > Thanks. One more thing: calling Py_DECREF on arrays that I have > acquired from PyArray_FROM_OF seems to cause crashes, am I correct in > assuming that Cython is somehow tracking all the PyObjects in the > scope (even ones acquired via the NumPy C API) and DECREF'ing it for me? If the function is declared with "object" as return type (or nothing which defaults to the same thing), then Cython will interpret that as the function handing away the reference of the object and make sure it is decref-ed. -- Dag Sverre From Chris.Barker at noaa.gov Fri Oct 9 12:08:59 2009 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri, 09 Oct 2009 09:08:59 -0700 Subject: [Numpy-discussion] [review] Easy win to improve numpy import times by 30 % In-Reply-To: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp> References: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp> Message-ID: <4ACF601B.4070903@noaa.gov> David Cournapeau wrote: > This branch improves numpy import times quite significantly on slow > machines: > I think it largely worths it, and will integrate this unless someone is > strongly against it or see a problem with the approach, +1 -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From numpy-discussion at maubp.freeserve.co.uk Fri Oct 9 12:24:56 2009 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Fri, 9 Oct 2009 17:24:56 +0100 Subject: [Numpy-discussion] [review] Easy win to improve numpy import times by 30 % In-Reply-To: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp> References: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp> Message-ID: <320fb6e00910090924p3c6af39dvead2d5d8502874e6@mail.gmail.com> On Fri, Oct 9, 2009 at 5:56 AM, David Cournapeau wrote: > > Since inspect is used in quite a few places, and that we only use it to > extract arguments from a function, I added a small numpy.lib.inspect > module, and ... Is numpy.lib intended as a public API? How about numpy.lib._inspect instead of numpy.lib.inspect to make it clear this new module is private? Peter From josef.pktd at gmail.com Fri Oct 9 15:05:36 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 9 Oct 2009 15:05:36 -0400 Subject: [Numpy-discussion] tostring() for array rows In-Reply-To: <4ACCCDE0.4090002@noaa.gov> References: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com> <4ACBAB06.4000008@noaa.gov> <1cd32cbb0910061347r5c12a09di2bd93a4310685045@mail.gmail.com> <4ACCCDE0.4090002@noaa.gov> Message-ID: <1cd32cbb0910091205y26c6850fy568373a4c35f6d60@mail.gmail.com> On Wed, Oct 7, 2009 at 1:20 PM, Christopher Barker wrote: > josef.pktd at gmail.com wrote: > >> I wanted to avoid the python loop and thought creating the view will be faster >> with large arrays. But for this I need to know the memory length of a >> row of arbitrary types for the conversion to strings, > > ndarray.itemsize > > might do it. > > -Chris Thanks, (I forgot to reply), it works and feels less low level than strides. Josef >>> tmps2[0].itemsize * np.size(tmps2[0]) 16 >>> tmp[0].itemsize * np.size(tmp[0]) 24 >>> tmps2.strides[0] 16 >>> tmp.strides[0] 24 >>> tmp array([[-1.414, -1.019, -1.171], [-1.273, 1.639, -0.854], [-1.795, -0.699, 0.595], [-0.865, -1.439, -0.275]]) >>> tmps2 array([(4.0, 0, 1), (1.0, 1, 3), (2.0, 2, 4), (4.0, 0, 1)], dtype=[('f0', '>> > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From amenity at enthought.com Fri Oct 9 17:59:55 2009 From: amenity at enthought.com (Amenity Applewhite) Date: Fri, 9 Oct 2009 16:59:55 -0500 Subject: [Numpy-discussion] October 16 Scientific Computing with Python Webinar: Traits References: <1874882496.1255125323830.JavaMail.root@p2-ws606.ad.prodcc.net> Message-ID: Having trouble viewing this email? Click here Friday, October 16: Traits SCIENTIFIC COMPUTING WITH PYTHON WEBINAR Hello! It's already time for our October Scientific Computing with Python webinar! This month we'll be handling Traits, one of our most popular training topics. Traits: Expanding the Power of Attributes An essential component of the open source Enthought Tool Suite, The Traits package is at the center of all development we do at Enthought. In fact, it has changed the mental model we use for programming in the already extremely efficient Python programming language. Briefly, a trait is a type definition that can be used for normal Python object attributes, giving the attributes some additional characteristics: initialization, validation, delegation, notification, and (optionally) visualization (GUIs). In this webinar we will provide an introduction to Traits by walking through several examples that show what you can do with Traits. Scientific Computing With Python Webinar: Traits October 16 1pm CDT/6pm UTC Register at GoToMeeting We hope to see you there! Also, don't forget that this free event is open to the public. Use the link at the bottom of this email to forward an invitation to your friends and colleagues. As always, feel free to contact us with questions, concerns, or suggestions for future webinar topics. Have a great weekend, The Enthought Team Enthought, Inc. Quick Links www.enthought.com code.enthought.com Facebook Blog Forward email This email was sent to leah at enthought.com by amenity at enthought.com. Update Profile/Email Address | Instant removal with SafeUnsubscribe? | Privacy Policy. Enthought, Inc. | 515 Congress Ave. | Suite 2100 | Austin | TX | 78701 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vs at it.uu.se Mon Oct 12 03:53:14 2009 From: vs at it.uu.se (Virgil Stokes) Date: Mon, 12 Oct 2009 09:53:14 +0200 Subject: [Numpy-discussion] October 16 Scientific Computing with Python Webinar: Traits In-Reply-To: References: <1874882496.1255125323830.JavaMail.root@p2-ws606.ad.prodcc.net> Message-ID: <4AD2E06A.3070807@it.uu.se> An HTML attachment was scrubbed... URL: From perfreem at gmail.com Mon Oct 12 10:18:44 2009 From: perfreem at gmail.com (per freem) Date: Mon, 12 Oct 2009 10:18:44 -0400 Subject: [Numpy-discussion] finding nonzero elements in list Message-ID: hi all, i'm trying to find nonzero elements in an array, as follows: a = array([[1, 0], [1, 1], [1, 1], [0, 1]]) i want to find all elements that are [1,1]. i tried: nonzero(a == [1,0]) but i cannot interpret the output. the output i get is: (array([0, 0, 1, 2]), array([0, 1, 0, 0])) i simply want to find the indices of the elements that equal [1,0]. how can i do this? thanks. From josef.pktd at gmail.com Mon Oct 12 10:36:09 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 12 Oct 2009 10:36:09 -0400 Subject: [Numpy-discussion] finding nonzero elements in list In-Reply-To: References: Message-ID: <1cd32cbb0910120736s392280f6r85b2d55eee422335@mail.gmail.com> On Mon, Oct 12, 2009 at 10:18 AM, per freem wrote: > hi all, > > i'm trying to find nonzero elements in an array, as follows: > > a = array([[1, 0], > ? ? ? [1, 1], > ? ? ? [1, 1], > ? ? ? [0, 1]]) > > i want to find all elements that are [1,1]. i tried: nonzero(a == > [1,0]) but i cannot interpret the output. the output i get is: > (array([0, 0, 1, 2]), array([0, 1, 0, 0])) > > i simply want to find the indices of the elements that equal [1,0]. > how can i do this? thanks. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > a == [1,0] does elementwise comparison, you need to aggregate condition for all elements of row >>> a = np.array([[1, 0], [1, 1], [1, 1], [0, 1]]) >>> np.nonzero((a==[1,0]).all(1)) (array([0]),) >>> np.where((a==[1,0]).all(1)) (array([0]),) >>> np.nonzero((a==[1,1]).all(1)) (array([1, 2]),) Josef From gokhansever at gmail.com Mon Oct 12 10:39:59 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Mon, 12 Oct 2009 09:39:59 -0500 Subject: [Numpy-discussion] finding nonzero elements in list In-Reply-To: References: Message-ID: <49d6b3500910120739k21dcb120p21525134249489f8@mail.gmail.com> On Mon, Oct 12, 2009 at 9:18 AM, per freem wrote: > hi all, > > i'm trying to find nonzero elements in an array, as follows: > > a = array([[1, 0], > [1, 1], > [1, 1], > [0, 1]]) > > i want to find all elements that are [1,1]. i tried: nonzero(a == > [1,0]) but i cannot interpret the output. the output i get is: > (array([0, 0, 1, 2]), array([0, 1, 0, 0])) > > i simply want to find the indices of the elements that equal [1,0]. > how can i do this? thanks. > You might simply apply a mask to your array satisfying the condition: I[1]: a = array([[1, 0], ...: [1, 1], ...: [1, 1], ...: [0, 1]]) I[2]: a == [1,0] O[2]: array([[ True, True], [ True, False], [ True, False], [False, False]], dtype=bool) I[3]: a[a==[1,0]] O[3]: array([1, 0, 1, 1]) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Mon Oct 12 10:44:04 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Mon, 12 Oct 2009 09:44:04 -0500 Subject: [Numpy-discussion] finding nonzero elements in list In-Reply-To: <49d6b3500910120739k21dcb120p21525134249489f8@mail.gmail.com> References: <49d6b3500910120739k21dcb120p21525134249489f8@mail.gmail.com> Message-ID: <49d6b3500910120744u576bd79fi9b2e2904429356e2@mail.gmail.com> On Mon, Oct 12, 2009 at 9:39 AM, G?khan Sever wrote: > > > On Mon, Oct 12, 2009 at 9:18 AM, per freem wrote: > >> hi all, >> >> i'm trying to find nonzero elements in an array, as follows: >> >> a = array([[1, 0], >> [1, 1], >> [1, 1], >> [0, 1]]) >> >> i want to find all elements that are [1,1]. i tried: nonzero(a == >> [1,0]) but i cannot interpret the output. the output i get is: >> (array([0, 0, 1, 2]), array([0, 1, 0, 0])) >> >> i simply want to find the indices of the elements that equal [1,0]. >> how can i do this? thanks. >> > > > You might simply apply a mask to your array satisfying the condition: > > I[1]: a = array([[1, 0], > ...: [1, 1], > ...: [1, 1], > ...: [0, 1]]) > > I[2]: a == [1,0] > O[2]: > array([[ True, True], > [ True, False], > [ True, False], > [False, False]], dtype=bool) > > I[3]: a[a==[1,0]] > O[3]: array([1, 0, 1, 1]) > Addendum; This might work better since you are looking non-zero elements I[19]: a[a==[1,0]] & a[a==[0,1]] O[19]: array([1, 0, 0, 1]) > > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > G?khan > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From perfreem at gmail.com Mon Oct 12 17:25:39 2009 From: perfreem at gmail.com (per freem) Date: Mon, 12 Oct 2009 17:25:39 -0400 Subject: [Numpy-discussion] performance of scipy: potential inefficiency in logsumexp and sampling from multinomial Message-ID: hi all, i have a piece of code that relies heavily on sampling from multinomial distributions and using their results to compute log probabilities. my code makes heavy use of 'multinomial' from scipy, and of 'logsumexp'. my code is unusually slow, and profiling it with Python's "cPickle" module reveals that most of the time is spent in the following functions: 479.524 0.000 code.py:211(my_func) 122.682 0.000 /Library/Python/2.5/site-packages/scipy/maxentropy/maxentutils.py:27(logsumexp) 40.645 0.000 /Library/Python/2.5/site-packages/numpy/core/numeric.py:180(asarray) 20.374 0.000 {method 'max' of 'numpy.ndarray' objects} (the first column represents cumulative time, the second is percall time.) my code (listed as 'my_func' above) essentially computes a list of log probabilities, exponentiates them and renormalizes them (using 'logsumexp') and then samples from a multinomial distribution using those probabilities as a parameter. i then check to see which object came up true from the multinomial sample. here's a sketch of the code: def my_func(my_list, n_items) final_list = [] for n in xrange(n_items): prob = my_dict[(my_list(n), n)] final_list.append(prob) final_list = final_list - logsumexp(final_list) sample = multinomial(1, exp(final_list)) sample_index = list(sampled_reassignment).index(1) return sample_index the list 'my_list' usually has around 3 to 5 elements in it, and 'my_dict' has about 500-1000 keys. this function gets called about 1.5 million times in my code, and it takes about 5 minutes, which seems very long relative to these operations. (i'd like to scale this up to a case where the function is called about 10-120 million times.) are there known efficiency issues with logsumexp? it seems like it should be a very cheap operation. also, 'multinomial' ought to be relatively cheap, i believe. does anyone have any ideas on how this can be optimized? any input will be greatly appreciated. i am also open to using cython if that is likely to make a significant improvement in this case. also, what is likely to be the origin of the call to "asarray"? (i am not explicitly calling that function, it must be indirectly via some other function.) thanks very much. From charlesr.harris at gmail.com Mon Oct 12 17:48:51 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 12 Oct 2009 15:48:51 -0600 Subject: [Numpy-discussion] performance of scipy: potential inefficiency in logsumexp and sampling from multinomial In-Reply-To: References: Message-ID: On Mon, Oct 12, 2009 at 3:25 PM, per freem wrote: > hi all, > > i have a piece of code that relies heavily on sampling from > multinomial distributions and using their results to compute log > probabilities. my code makes heavy use of 'multinomial' from scipy, > and of 'logsumexp'. > > my code is unusually slow, and profiling it with Python's "cPickle" > module reveals that most of the time is spent in the following > functions: > > 479.524 0.000 code.py:211(my_func) > 122.682 0.000 > > /Library/Python/2.5/site-packages/scipy/maxentropy/maxentutils.py:27(logsumexp) > 40.645 0.000 > /Library/Python/2.5/site-packages/numpy/core/numeric.py:180(asarray) > 20.374 0.000 {method 'max' of 'numpy.ndarray' objects} > > (the first column represents cumulative time, the second is percall time.) > > my code (listed as 'my_func' above) essentially computes a list of log > probabilities, exponentiates them and renormalizes them (using > 'logsumexp') and then samples from a multinomial distribution using > those probabilities as a parameter. i then check to see which object > came up true from the multinomial sample. here's a sketch of the code: > > def my_func(my_list, n_items) > final_list = [] > for n in xrange(n_items): > prob = my_dict[(my_list(n), n)] > final_list.append(prob) > final_list = final_list - logsumexp(final_list) > sample = multinomial(1, exp(final_list)) > sample_index = list(sampled_reassignment).index(1) > return sample_index > > the list 'my_list' usually has around 3 to 5 elements in it, and > 'my_dict' has about 500-1000 keys. > > this function gets called about 1.5 million times in my code, and it > takes about 5 minutes, which seems very long relative to these > operations. (i'd like to scale this up to a case where the function is > called about 10-120 million times.) > > are there known efficiency issues with logsumexp? it seems like it > should be a very cheap operation. also, 'multinomial' ought to be > relatively cheap, i believe. does anyone have any ideas on how this > can be optimized? any input will be greatly appreciated. i am also > open to using cython if that is likely to make a significant > improvement in this case. > > also, what is likely to be the origin of the call to "asarray"? (i am > not explicitly calling that function, it must be indirectly via some > other function.) > > You are going back and forth between lists and ndarrays of pretty small sequences of items of variable size. That is bound to be inefficient and isn't going to get you the benefits of vectorization. Is there any way you can do what you want using the rows in a single big array? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From perfreem at gmail.com Mon Oct 12 23:53:05 2009 From: perfreem at gmail.com (per freem) Date: Mon, 12 Oct 2009 23:53:05 -0400 Subject: [Numpy-discussion] simple array multiplication question Message-ID: hi all, i am trying to write a simple product of 3 arrays (as vectorized code) but am having some difficulty. i have three arrays, one is a list containing several lists: p = array([[ 0.2, 0.8], [ 0.5, 0.5], [ 0.3, 0.7]]) each list in the array 'p' is of size N -- in this case N = 2. i have a second array containing a set of numbers, each between 0 and N-1: l = array([0, 0, 1]) and finally an array of the same size as l: s = array([10, 20, 30]) what i want to do is pick the columns l of p, and multiply each one by the numbers in s. the first step, picking columns l of p, is simply: cols = p[arange(3), l] then i want to multiply each one by the numbers in s, and i do it like this: cols * s.reshape(3,1) this seems to work, but i am concerned that it might be inefficient. is there a cleaner way of doing this? is 'arange' operation necessary to reference all the 'l' columns of p? also, is the reshape operation expensive? thanks very much. From cournape at gmail.com Tue Oct 13 00:00:35 2009 From: cournape at gmail.com (David Cournapeau) Date: Tue, 13 Oct 2009 13:00:35 +0900 Subject: [Numpy-discussion] [review] Easy win to improve numpy import times by 30 % In-Reply-To: <5b8d13220910092249w1aa3e7d7i81740a96e44f2e37@mail.gmail.com> References: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp> <320fb6e00910090924p3c6af39dvead2d5d8502874e6@mail.gmail.com> <5b8d13220910092249w1aa3e7d7i81740a96e44f2e37@mail.gmail.com> Message-ID: <5b8d13220910122100y2f83d807wbec2ba75b9356a08@mail.gmail.com> On Sat, Oct 10, 2009 at 2:49 PM, David Cournapeau wrote: > On Sat, Oct 10, 2009 at 1:24 AM, Peter > wrote: >> On Fri, Oct 9, 2009 at 5:56 AM, David Cournapeau >> wrote: >>> >>> Since inspect is used in quite a few places, and that we only use it to >>> extract arguments from a function, I added a small numpy.lib.inspect >>> module, and ... >> >> Is numpy.lib intended as a public API? How about numpy.lib._inspect >> instead of numpy.lib.inspect to make it clear this new module is private? > > I could see it being used by other packages like scipy for example. > Instead of numpy.lib.inspect, we may choose to have something like > numpy.lib.compat or something, where we could put several potential > cases similar to this one. Ok, I created a new numpy subpackage numpy.compat, and numpy.compat will contain the public API. The implementation is in numpy.compat._inspect. Unless someones objects to it, I will include this within the next day or so, cheers, David From dwf at cs.toronto.edu Tue Oct 13 00:40:18 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 13 Oct 2009 00:40:18 -0400 Subject: [Numpy-discussion] simple array multiplication question In-Reply-To: References: Message-ID: <14427D13-1307-4E8A-B689-78CBB21F773E@cs.toronto.edu> On 12-Oct-09, at 11:53 PM, per freem wrote: > what i want to do is pick the columns l of p, and multiply each one by > the numbers in s. the first step, picking columns l of p, is simply: > > cols = p[arange(3), l] This isn't picking columns of p, this is picking the times at (0, 0), (1, 0), and (2, 1). Is this what you meant? In [36]: p[arange(3), [0,0,1]] Out[36]: array([ 0.2, 0.5, 0.7]) In [37]: p[:, [0,0,1]] Out[37]: array([[ 0.2, 0.2, 0.8], [ 0.5, 0.5, 0.5], [ 0.3, 0.3, 0.7]]) In [38]: p[arange(3), [0,0,1]] * s.reshape(3,1) Out[38]: array([[ 2., 5., 7.], [ 4., 10., 14.], [ 6., 15., 21.]]) In [41]: p[:, [0,0,1]] * s.reshape(3,1) Out[41]: array([[ 2., 2., 8.], [ 10., 10., 10.], [ 9., 9., 21.]]) Notice the difference. > then i want to multiply each one by the numbers in s, and i do it > like this: > > cols * s.reshape(3,1) > > this seems to work, but i am concerned that it might be inefficient. > is there a cleaner way of doing this? is 'arange' operation necessary > to reference all the 'l' columns of p? That's about as efficient as it gets, I think. > also, is the reshape operation > expensive? No. It will return a view, rather than make a copy. You could also do cols * s[:, np.newaxis], equivalently. David From dpeterson at enthought.com Tue Oct 13 12:43:53 2009 From: dpeterson at enthought.com (Dave Peterson) Date: Tue, 13 Oct 2009 11:43:53 -0500 Subject: [Numpy-discussion] October 16 Scientific Computing with Python Webinar: Traits In-Reply-To: <4AD2E06A.3070807@it.uu.se> References: <1874882496.1255125323830.JavaMail.root@p2-ws606.ad.prodcc.net> <4AD2E06A.3070807@it.uu.se> Message-ID: <4AD4AE49.5020201@enthought.com> Virgil Stokes wrote: > Amenity Applewhite wrote: >> >> Having trouble viewing this email? Click here >> >> >> SCP Banner October >> >> Friday, October 16: Traits >> SCIENTIFIC COMPUTING WITH PYTHON WEBINAR >> >> Hello! >> >> It's already time for our October Scientific Computing with Python >> webinar! This month we'll be handling Traits >> , >> one of our most popular training topics. >> >> >> >> Traits: Expanding the Power of Attributes >> Enthought Tool Suite >> >> An essential component of the open source Enthought Tool Suite >> , >> The Traits package is at the center of all development we do at >> Enthought. In fact, it has changed the mental model we use for >> programming in the already extremely efficient Python programming >> language. >> >> Briefly, a trait is a type definition that can be used for normal >> Python object attributes, giving the attributes some additional >> characteristics: initialization, validation, delegation, >> notification, and (optionally) visualization (GUIs). In this >> webinar we will provide an introduction to Traits by walking through >> several examples that show what you can do with Traits. >> >> >> >> Scientific Computing With Python Webinar: Traits >> October 16 >> 1pm CDT/6pm UTC >> Register at GoToMeeting >> >> >> >> >> We hope to see you there! Also, don't forget that this free event is >> open to the public. Use the link at the bottom of this email to >> forward an invitation to your friends and colleagues. >> >> As always, feel free to contact us >> with questions, concerns, or suggestions for future webinar topics. >> >> Have a great weekend, >> >> The Enthought Team >> Enthought, Inc. >> >> >> Quick Links >> www.enthought.com >> >> code.enthought.com >> >> Facebook >> >> Blog >> >> >> Enthought Header >> >> >> >> Forward email >> >> Safe Unsubscribe >> >> >> This email was sent to leah at enthought.com >> by amenity at enthought.com . >> Update Profile/Email Address >> >> | Instant removal with SafeUnsubscribe >> ? >> | Privacy Policy >> . >> >> >> Enthought, Inc. | 515 Congress Ave. | Suite 2100 | Austin | TX | 78701 >> >> >> ------------------------------------------------------------------------ >> >> > Do participants in this Webinar need to have GoToMeeting software > installed and if yes, do they need to purchase this software? GoToMeeting does require installing some local software, but it's an applet that should be quickly installed when you request to join the meeting. There is no purchase required, the applet is free. -- Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From perfreem at gmail.com Wed Oct 14 00:27:31 2009 From: perfreem at gmail.com (per freem) Date: Wed, 14 Oct 2009 00:27:31 -0400 Subject: [Numpy-discussion] [SciPy-User] vectorized version of 'multinomial' sampling function In-Reply-To: <6E9F4234-F3FD-4E78-BDC4-D0960FE52242@cs.toronto.edu> References: <6E9F4234-F3FD-4E78-BDC4-D0960FE52242@cs.toronto.edu> Message-ID: On Tue, Oct 13, 2009 at 7:59 PM, David Warde-Farley wrote: > On 13-Oct-09, at 5:01 PM, per freem wrote: > >> hi all, >> >> i have a series of probability vector that i'd like to feed into >> multinomial to get an array of vector outcomes back. for example, >> given: >> >> p = array([[ 0.9 , ?0.05, ?0.05], >> ? ? ? [ 0.05, ?0.05, ?0.9 ]]) >> >> i'd like to call multinomial like this: >> >> multinomial(1, p) >> >> to get a vector of multinomial samplers, each using the nth list in >> 'p'. something like: >> >> array([[1, 0, 0], [0, 0 1]]) in this case. is this possible? it seems >> like 'multinomial' takes only a one dimensional array. i could write >> this as a "for" loop of course but i prefer a vectorized version since >> speed is crucial for me here. >> >> thanks very much. > > Your best bet is probably to copy the pyrex/Cython code for > multinomial in numpy/random/mtrand/mtrand.pyx, and add the > functionality you want there. ?If you do it right (i.e. type your loop > indices) then it should be fast. > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Hi David thanks for your reply. i am not sure how to do this though -- is the vectorized version i would write in pyrex/cython simply going to iterate through this vector of vectors and do the operation? will that really be efficient? is there some other library that can do vectorized multinomial like i described? i really am not sure how to write this cython. From robert.kern at gmail.com Wed Oct 14 00:30:33 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 13 Oct 2009 23:30:33 -0500 Subject: [Numpy-discussion] [SciPy-User] vectorized version of 'multinomial' sampling function In-Reply-To: References: <6E9F4234-F3FD-4E78-BDC4-D0960FE52242@cs.toronto.edu> Message-ID: <3d375d730910132130y204064a9vcd9e4117a7f6baba@mail.gmail.com> On Tue, Oct 13, 2009 at 23:27, per freem wrote: > thanks for your reply. i am not sure how to do this though -- is the > vectorized version i would write in pyrex/cython simply going to > iterate through this vector of vectors and do the operation? will that > really be efficient? Yes because the iteration, if written correctly, will be in C. This is all that "vectorization" means in this context. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From thomas.robitaille at gmail.com Wed Oct 14 09:52:29 2009 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Wed, 14 Oct 2009 09:52:29 -0400 Subject: [Numpy-discussion] rec_append_fields and n-dimensional fields Message-ID: <08437E25-B5F0-46EB-B9E4-DE9DF4FEA700@gmail.com> Hi, I'm interested in constructing a recarray with fields that have two or more dimensions. This can be done from scratch like this: r = np.recarray((10,),dtype=[('c1',float,(3,))]) However, I am interested in appending a field to an existing recarray. Rather than repeating existing code I would like to use the numpy.lib.recfunctions.rec_append_fields method, but I am not sure how to specify the dimension of each field, since it doesn't seem to be possible to specify the dtype as a tuple as above. Thanks for any advice, Thomas From gael.varoquaux at normalesup.org Wed Oct 14 11:32:07 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 14 Oct 2009 17:32:07 +0200 Subject: [Numpy-discussion] Speed of np.array versus np.vstack Message-ID: <20091014153207.GI15987@phare.normalesup.org> I tend to use np.array to stack arrays rather than np.vstack, as I find it does what I want with higher dimensional arrays. However, I was quite surprised to see a large speed difference: In [1]: import numpy as np In [2]: N = 1e6 In [3]: M = 10 In [4]: l = [np.random.random(N) for _ in range(M)] In [5]: %timeit np.vstack(l) 10 loops, best of 3: 82.7 ms per loop In [6]: %timeit np.array(l) 10 loops, best of 3: 822 ms per loop I can't find the reasons for this speed difference. Also, I don't see what is the correct way to get the behavior I want without paying the extra speed cost. Cheers, Ga?l From nwagner at iam.uni-stuttgart.de Wed Oct 14 12:52:26 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 14 Oct 2009 18:52:26 +0200 Subject: [Numpy-discussion] TypeError: 'bool' object is not callable Message-ID: >>> numpy.__version__ '1.4.0.dev7528' ====================================================================== ERROR: test_from_unicode (test_defchararray.TestBasic) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/tests/test_defchararray.py", line 68, in test_from_unicode A = np.char.array(u'\u03a3') File "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/defchararray.py", line 2453, in array obj = unicode(obj) TypeError: 'bool' object is not callable ---------------------------------------------------------------------- Ran 2277 tests in 18.933s FAILED (KNOWNFAIL=1, errors=1) From mdroe at stsci.edu Wed Oct 14 12:59:28 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Wed, 14 Oct 2009 12:59:28 -0400 Subject: [Numpy-discussion] TypeError: 'bool' object is not callable In-Reply-To: References: Message-ID: <4AD60370.6040800@stsci.edu> That's my bad. I will commit a fix to SVN shortly. Mike Nils Wagner wrote: > >>> numpy.__version__ > '1.4.0.dev7528' > > ====================================================================== > ERROR: test_from_unicode (test_defchararray.TestBasic) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/tests/test_defchararray.py", > line 68, in test_from_unicode > A = np.char.array(u'\u03a3') > File > "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/defchararray.py", > line 2453, in array > obj = unicode(obj) > TypeError: 'bool' object is not callable > > ---------------------------------------------------------------------- > Ran 2277 tests in 18.933s > > FAILED (KNOWNFAIL=1, errors=1) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From mdroe at stsci.edu Wed Oct 14 13:02:25 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Wed, 14 Oct 2009 13:02:25 -0400 Subject: [Numpy-discussion] TypeError: 'bool' object is not callable In-Reply-To: <4AD60370.6040800@stsci.edu> References: <4AD60370.6040800@stsci.edu> Message-ID: <4AD60421.9040701@stsci.edu> The fix is in SVN r7530. Mike Michael Droettboom wrote: > That's my bad. I will commit a fix to SVN shortly. > > Mike > > Nils Wagner wrote: > >> >>> numpy.__version__ >> '1.4.0.dev7528' >> >> ====================================================================== >> ERROR: test_from_unicode (test_defchararray.TestBasic) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/tests/test_defchararray.py", >> line 68, in test_from_unicode >> A = np.char.array(u'\u03a3') >> File >> "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/defchararray.py", >> line 2453, in array >> obj = unicode(obj) >> TypeError: 'bool' object is not callable >> >> ---------------------------------------------------------------------- >> Ran 2277 tests in 18.933s >> >> FAILED (KNOWNFAIL=1, errors=1) >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From Ashwin.Kashyap at thomson.net Wed Oct 14 19:04:01 2009 From: Ashwin.Kashyap at thomson.net (Kashyap Ashwin) Date: Wed, 14 Oct 2009 19:04:01 -0400 Subject: [Numpy-discussion] MKL with 64bit crashes Message-ID: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com> Hello, I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) with MKL. This is my site.cfg: [mkl] # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t include_dirs = /opt/intel/mkl/10.2.2.025/include lapack_libs = mkl_lapack #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread, iomp5, mkl_vml_mc3 mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64, mkl_mc3, mkl_def In [2]: numpy.test() Running unit tests for numpy NumPy version 1.3.0 NumPy is installed in /opt/********************* Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] nose version 0.11.0 .......... MKL ERROR: Parameter 4 was incorrect on entry to DGESV MKL ERROR: Parameter 4 was incorrect on entry to DGESV MKL ERROR: Parameter 4 was incorrect on entry to DGESV MKL ERROR: Parameter 4 was incorrect on entry to DGESV MKL ERROR: Parameter 4 was incorrect on entry to DGESV MKL ERROR: Parameter 4 was incorrect on entry to DGESV .. MKL ERROR: Parameter 4 was incorrect on entry to DGESV FSegmentation fault I am using gcc: gcc -v Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4) Anyone having the same issues? Do I have the mkl_libs correctly configured (this seems like a black art!) Thanks, Ashwin From cournape at gmail.com Wed Oct 14 20:30:37 2009 From: cournape at gmail.com (David Cournapeau) Date: Thu, 15 Oct 2009 09:30:37 +0900 Subject: [Numpy-discussion] MKL with 64bit crashes In-Reply-To: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com> References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com> Message-ID: <5b8d13220910141730j7294de20tca97402e7dd24f9b@mail.gmail.com> On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin wrote: > Hello, > I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) with > MKL. > This is my site.cfg: > [mkl] > # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ > library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t > include_dirs = /opt/intel/mkl/10.2.2.025/include > lapack_libs = mkl_lapack > #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread, > iomp5, mkl_vml_mc3 > mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64, > mkl_mc3, mkl_def The order does not look right - I don't know the exact order (each version of the MKL changes the libraries), but you should respect the order as given in the MKL manual. > MKL ERROR: Parameter 4 was incorrect on entry to DGESV This suggests an error when passing argument to MKL - I believe your version of MKL uses the gfortran ABI by default, and hardy uses g77 as the default fortran compiler. You should either recompile everything with gfortran, or regenerate the MKL interface libraries with g77 (as indicated in the manual). cheers, David From robince at gmail.com Thu Oct 15 06:37:13 2009 From: robince at gmail.com (Robin) Date: Thu, 15 Oct 2009 11:37:13 +0100 Subject: [Numpy-discussion] extension questions: f2py and cython Message-ID: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com> Hi, Sent this last week but checking the archives it appears not to have got through. Hopefully this will work... I am looking at moving some of my code to fortran to use with f2py. To get started I used this simple example: SUBROUTINE bincount (x,c,n,m) IMPLICIT NONE INTEGER, INTENT(IN) :: n,m INTEGER, DIMENSION(n), INTENT(IN) :: x INTEGER, DIMENSION(0:m-1), INTENT(OUT) :: c INTEGER :: i DO i = 1, n c(x(i)) = c(x(i)) + 1 END DO END It performs well: In [1]: x = np.random.random_integers(0,1023,1000000).astype(int) In [4]: timeit test.bincount(x,1024) 1000 loops, best of 3: 1.16 ms per loop In [5]: timeit np.bincount(x) 100 loops, best of 3: 4 ms per loop I'm guessing most of the benefit comes from less checking + not having to find the maximum value (which I pass as parameter m). But I have some questions. It seems to work as is, but I don't set c to zeros anywhere. Can I assume arrays created by f2py are zero? Is this the recommended way to use f2py with arrays? (I initially tried using assumed arrays with DIMENSION(:) but it couldn't get it to work). Also I'm quite new to fortran - what would be the advantages, if any, of using a FORALL instead of DO in a situation like this? I guess with 1D arrays it doesn't make a copy since ordering is not a problem, but if it was 2D arrays am I right in thinking that if I passed in a C order array it would automatically make a copy to F order. What about the return - will I get a number array in F order, or will it automatically be copied to C order? (I guess I will see but I haven't got onto trying that yet). What if I wanted to keep all the array creation in numpy - ie call it as the fortran subroutine bincount(x,c) and have c modified in place? Should I be using !f2py comments? I wasn't clear if these are needed - it seems to work as is but could they give any improvement? For comparison I tried the same thing in cython - after a couple of iterations with not typing things properly I ended up with: import numpy as np cimport numpy as np cimport cython @cython.boundscheck(False) def bincount(np.ndarray[np.int_t, ndim=1] x not None,int m): cdef int n = x.shape[0] cdef unsigned int i cdef np.ndarray[np.int_t, ndim=1] c = np.zeros(m,dtype=np.int) for i from 0 <= i < n: c[x[i]] += 1 return c which now performs a bit better than np.bincount, but still significantly slower than the fortran. Is this to be expected or am I missing something in the cython? In [14]: timeit ctest.bincount(x,1024) 100 loops, best of 3: 3.31 ms per loop Cheers Robin From robince at gmail.com Thu Oct 15 08:53:48 2009 From: robince at gmail.com (Robin) Date: Thu, 15 Oct 2009 13:53:48 +0100 Subject: [Numpy-discussion] extension questions: f2py and cython In-Reply-To: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com> References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com> Message-ID: <2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com> Hi, I have another question about distributing a Python extension which uses f2py wrapped code. Ideally I'd like to keep pure Python/Numpy alternatives and just use fortran version if available - but I think that should be OK. I'm more worried about distributing binaries on Windows - I think on Mac/Linux it would be ok to have a fortran compiler required and build it - but on Windows I guess one should really distribute binaries. What is the recommended (free) fortran 95 compiler for use with f2py on windows (gfortan with cygwin?) Is it possible to get f2py to build a static library on windows so I can just distribute that? Or will I need to include library files from the compiler? How many different binary versions does one need to support common recent windows setups? I guess I need a different binary for each major python version and 32/64 bits (ie 2.5 32bit, 2.6 32bit, 2.5 64bit, 2.6 64bit). Is this right, or would different binaries be required for XP, Vista, 7 etc. ? Can anyone point me to a smallish Python package that includes fortran code in this way that I could look to for inspiration? Cheers Robin From josef.pktd at gmail.com Thu Oct 15 09:19:12 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 15 Oct 2009 09:19:12 -0400 Subject: [Numpy-discussion] extension questions: f2py and cython In-Reply-To: <2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com> References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com> <2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com> Message-ID: <1cd32cbb0910150619u7aa85e34le22b7c08237584db@mail.gmail.com> On Thu, Oct 15, 2009 at 8:53 AM, Robin wrote: > Hi, > > I have another question about distributing a Python extension which > uses f2py wrapped code. Ideally I'd like to keep pure Python/Numpy > alternatives and just use fortran version if available - but I think > that should be OK. > > I'm more worried about distributing binaries on Windows - I think on > Mac/Linux it would be ok to have a fortran compiler required and build > it - but on Windows I guess one should really distribute binaries. > > What is the recommended (free) fortran 95 compiler for use with f2py > on windows (gfortan with cygwin?) > Is it possible to get f2py to build a static library on windows so I > can just distribute that? Or will I need to include library files from > the compiler? > How many different binary versions does one need to support common > recent windows setups? I guess I need a different binary for each > major python version and 32/64 bits (ie 2.5 32bit, 2.6 32bit, 2.5 > 64bit, 2.6 64bit). Is this right, or would different binaries be > required for XP, Vista, 7 etc. ? The same binaries should work on both XP and Vista. > > Can anyone point me to a smallish Python package that includes fortran > code in this way that I could look to for inspiration? I don't know if you can pymc smallish, but it is using a considerable amount of fortran, and distributes only win32-py2.5 binaries. http://code.google.com/p/pymc/ for the rest I have no idea. (for numpy/scipy, I'm still using g77 with the official mingw for windows xp, win32 only) Josef > > Cheers > > Robin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Ashwin.Kashyap at thomson.net Thu Oct 15 11:00:40 2009 From: Ashwin.Kashyap at thomson.net (Kashyap Ashwin) Date: Thu, 15 Oct 2009 11:00:40 -0400 Subject: [Numpy-discussion] MKL with 64bit crashes In-Reply-To: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com> References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com> Message-ID: <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com> I followed the advice given by the Intel MKL link adviser (http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/) This is my new site.cfg: mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core I also exported CFLAGS="-fopenmp" and built with the --fcompiler=gnu95. Now I get these errors on import: Running unit tests for numpy NumPy version 1.3.0 NumPy is installed in /opt/Personalization/lib/python2.5/site-packages/numpy Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] nose version 0.11.0 *** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined symbol: mkl_dft_commit_descriptor_s_c2c_md_omp *** libmkl_def.so *** failed with error : libmkl_def.so: undefined symbol: mkl_dft_commit_descriptor_s_c2c_md_omp MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so Any hints? Thanks, Ashwin Your message: On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin wrote: > Hello, > I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) with > MKL. > This is my site.cfg: > [mkl] > # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ > library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t > include_dirs = /opt/intel/mkl/10.2.2.025/include > lapack_libs = mkl_lapack > #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread, > iomp5, mkl_vml_mc3 > mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64, > mkl_mc3, mkl_def The order does not look right - I don't know the exact order (each version of the MKL changes the libraries), but you should respect the order as given in the MKL manual. > MKL ERROR: Parameter 4 was incorrect on entry to DGESV This suggests an error when passing argument to MKL - I believe your version of MKL uses the gfortran ABI by default, and hardy uses g77 as the default fortran compiler. You should either recompile everything with gfortran, or regenerate the MKL interface libraries with g77 (as indicated in the manual). cheers, David From matthieu.brucher at gmail.com Thu Oct 15 11:06:05 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 15 Oct 2009 17:06:05 +0200 Subject: [Numpy-discussion] MKL with 64bit crashes In-Reply-To: <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com> References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com> <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com> Message-ID: Hi, You need to use the static libraries, are you sure you currently do? Matthieu 2009/10/15 Kashyap Ashwin : > I followed the advice given by the Intel MKL link adviser > (http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/) > > This is my new site.cfg: > mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core > > I also exported CFLAGS="-fopenmp" and built with the --fcompiler=gnu95. > Now I get these errors on import: > Running unit tests for numpy > NumPy version 1.3.0 > NumPy is installed in > /opt/Personalization/lib/python2.5/site-packages/numpy > Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4 > (Ubuntu 4.2.4-1ubuntu3)] > nose version 0.11.0 > > *** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined symbol: > mkl_dft_commit_descriptor_s_c2c_md_omp > *** libmkl_def.so *** failed with error : libmkl_def.so: undefined > symbol: mkl_dft_commit_descriptor_s_c2c_md_omp > MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so > > > Any hints? > > Thanks, > Ashwin > > > > Your message: > > On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin > wrote: >> Hello, >> I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) > with >> MKL. >> This is my site.cfg: >> [mkl] >> # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ >> library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t >> include_dirs = /opt/intel/mkl/10.2.2.025/include >> lapack_libs = mkl_lapack >> #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread, >> iomp5, mkl_vml_mc3 >> mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64, >> mkl_mc3, mkl_def > > The order does not look right - I don't know the exact order (each > version of the MKL changes the libraries), but you should respect the > order as given in the MKL manual. > >> MKL ERROR: Parameter 4 was incorrect on entry to DGESV > > This suggests an error when passing argument to MKL - I believe your > version of MKL uses the gfortran ABI by default, and hardy uses g77 as > the default fortran compiler. You should either recompile everything > with gfortran, or regenerate the MKL interface libraries with g77 (as > indicated in the manual). > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From HAWRYLA at novachem.com Thu Oct 15 11:10:10 2009 From: HAWRYLA at novachem.com (Andrew Hawryluk) Date: Thu, 15 Oct 2009 09:10:10 -0600 Subject: [Numpy-discussion] extension questions: f2py and cython In-Reply-To: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com> References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com> Message-ID: <48C01AE7354EC240A26F19CEB995E943033AF2F8@CHMAILMBX01.novachem.com> > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- > bounces at scipy.org] On Behalf Of Robin > Sent: 15 Oct 2009 4:37 AM > To: numpy-discussion at scipy.org > Subject: [Numpy-discussion] extension questions: f2py and cython > > Hi, > > Sent this last week but checking the archives it appears not to have > got through. Hopefully this will work... > > I am looking at moving some of my code to fortran to use with f2py. To > get started I used this simple example: ... > But I have some questions. It seems to work as is, but I don't set c to > zeros anywhere. Can I assume arrays created by f2py are zero? As I understand it, uninitialized variables in Fortran are compiler/system-dependent. Some compilers initialize values to zero, many leave the previous contents of the memory in place. It is safest to never use the value of an uninitialized variable. Andrew From HAWRYLA at novachem.com Thu Oct 15 11:24:03 2009 From: HAWRYLA at novachem.com (Andrew Hawryluk) Date: Thu, 15 Oct 2009 09:24:03 -0600 Subject: [Numpy-discussion] extension questions: f2py and cython In-Reply-To: <2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com> References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com> <2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com> Message-ID: <48C01AE7354EC240A26F19CEB995E943033AF2F9@CHMAILMBX01.novachem.com> > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- > bounces at scipy.org] On Behalf Of Robin > Sent: 15 Oct 2009 6:54 AM > To: numpy-discussion at scipy.org > Subject: Re: [Numpy-discussion] extension questions: f2py and cython > > Hi, > > I have another question about distributing a Python extension which > uses f2py wrapped code. Ideally I'd like to keep pure Python/Numpy > alternatives and just use fortran version if available - but I think > that should be OK. > > I'm more worried about distributing binaries on Windows - I think on > Mac/Linux it would be ok to have a fortran compiler required and build > it - but on Windows I guess one should really distribute binaries. > > What is the recommended (free) fortran 95 compiler for use with f2py on > windows (gfortan with cygwin?) Is it possible to get f2py to build a > static library on windows so I can just distribute that? Or will I need > to include library files from the compiler? I am using gfortran, which has a native Windows installer: http://www.scipy.org/F2PY_Windows I have also successfully used g95 with f2py on Windows. When f2py runs this on Windows, it produces a *.pyd file that contains the compiled code. E.g. myfoo.f --> myfoo.pyd. This is imported into python with 'import myfoo'. The recipient of the windows binary needs only the *.pyd file (and your *.py files). Andrew From mdroe at stsci.edu Thu Oct 15 12:40:01 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Thu, 15 Oct 2009 12:40:01 -0400 Subject: [Numpy-discussion] object array alignment issues Message-ID: <4AD75061.2020908@stsci.edu> I recently committed a regression test and bugfix for object pointers in record arrays of unaligned size (meaning where each record is not a multiple of sizeof(PyObject **)). For example: a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')]) a2 = np.zeros((10,), 'S10') # This copying would segfault a1['o'] = a2 http://projects.scipy.org/numpy/ticket/1198 Unfortunately, this unit test has opened up a whole hornet's nest of alignment issues on Solaris. The various reference counting functions (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, for instance. Interestingly, there are comments in there saying "handles misaligned data" (eg. line 190), but in fact it doesn't, and doesn't look to me like it would. But I won't rule out a mistake in building it on my part. So, how to fix this? One obvious workaround is for users to pass "align=True" to the dtype constructor. This works if the dtype descriptor is a dictionary or comma-separated string. Is there a reason it couldn't be made to work with the string-of-tuples form that I'm missing? It would be marginally more convenient from my application, but that's just a finesse issue. However, perhaps we should try to fix the underlying alignment problems? Unfortunately, it's not clear to me how to resolve them without at least some performance penalty. You either do an alignment check of the pointer, and then memcpy if unaligned, or just always use memcpy. Not sure which is faster, as memcpy may have a fast path already. These are object arrays anyway, so there's plenty of overhead already, and I don't think this would affect regular numerical arrays. If we choose not to fix it, perhaps we should we try to warn when creating an unaligned recarray on platforms where it matters? I do worry about having something that works perfectly well on one platform fail on another. In the meantime, I'll just mark the new regression test to "skip on Solaris". Mike -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From charlesr.harris at gmail.com Thu Oct 15 13:00:04 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 15 Oct 2009 11:00:04 -0600 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <4AD75061.2020908@stsci.edu> References: <4AD75061.2020908@stsci.edu> Message-ID: On Thu, Oct 15, 2009 at 10:40 AM, Michael Droettboom wrote: > I recently committed a regression test and bugfix for object pointers in > record arrays of unaligned size (meaning where each record is not a > multiple of sizeof(PyObject **)). > > For example: > > a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')]) > a2 = np.zeros((10,), 'S10') > # This copying would segfault > a1['o'] = a2 > > http://projects.scipy.org/numpy/ticket/1198 > > Unfortunately, this unit test has opened up a whole hornet's nest of > alignment issues on Solaris. No surprise there. Good unit tests seem to routinely uncover hornet's nests and Solaris is a platform that exercises the alignment part of the code. I think it is great that you are finding these problems. We folks working on Intel don't see them so much. > The various reference counting functions > (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, > for instance. Interestingly, there are comments in there saying > "handles misaligned data" (eg. line 190), but in fact it doesn't, and > doesn't look to me like it would. But I won't rule out a mistake in > building it on my part. > > So, how to fix this? > > One obvious workaround is for users to pass "align=True" to the dtype > constructor. This works if the dtype descriptor is a dictionary or > comma-separated string. Is there a reason it couldn't be made to work > with the string-of-tuples form that I'm missing? It would be marginally > more convenient from my application, but that's just a finesse issue. > > However, perhaps we should try to fix the underlying alignment > problems? Unfortunately, it's not clear to me how to resolve them > without at least some performance penalty. You either do an alignment > check of the pointer, and then memcpy if unaligned, or just always use > memcpy. Not sure which is faster, as memcpy may have a fast path > already. These are object arrays anyway, so there's plenty of overhead > already, and I don't think this would affect regular numerical arrays. > > I believe the memcpy approach is used for other unaligned parts of void types. There is an inherent performance penalty there, but I don't see how it can be avoided when using what are essentially packed structures. As to memcpy, it's performance seems to depend on the compiler/compiler version, old versions of gcc had *horrible* implementations of memcpy. I believe the situation has since improved. However, I'm not sure we should be coding to compiler issues unless it is unavoidable or the gain is huge. > If we choose not to fix it, perhaps we should we try to warn when > creating an unaligned recarray on platforms where it matters? I do > worry about having something that works perfectly well on one platform > fail on another. > > In the meantime, I'll just mark the new regression test to "skip on > Solaris". > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Thu Oct 15 14:01:43 2009 From: robince at gmail.com (Robin) Date: Thu, 15 Oct 2009 19:01:43 +0100 Subject: [Numpy-discussion] extension questions: f2py and cython In-Reply-To: <48C01AE7354EC240A26F19CEB995E943033AF2F8@CHMAILMBX01.novachem.com> References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com> <48C01AE7354EC240A26F19CEB995E943033AF2F8@CHMAILMBX01.novachem.com> Message-ID: <2d5132a50910151101t58e27093j1d775cd68581eb4a@mail.gmail.com> Hi, Thanks On Thu, Oct 15, 2009 at 4:10 PM, Andrew Hawryluk wrote: >> But I have some questions. It seems to work as is, but I don't set c > to >> zeros anywhere. Can I assume arrays created by f2py are zero? > > As I understand it, uninitialized variables in Fortran are > compiler/system-dependent. Some compilers initialize values to zero, > many leave the previous contents of the memory in place. It is safest to > never use the value of an uninitialized variable. But in this case I understood it was initialised or created by the f2py wrapper first and then passed to the fortran subroutine - so I wondered how f2py creates it (I think I traced it to array_from_pyobj() but I couldn't really understand what it was doing or whether it would always be zeros). I guess as you say though it is always safer to initialize explicitly Cheers Robin From Ashwin.Kashyap at thomson.net Thu Oct 15 14:04:58 2009 From: Ashwin.Kashyap at thomson.net (Kashyap Ashwin) Date: Thu, 15 Oct 2009 14:04:58 -0400 Subject: [Numpy-discussion] MKL with 64bit crashes In-Reply-To: <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com> References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com> <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com> Message-ID: <68DF70B3485CC648835655773E92314F9208C5@prinsmail02.am.thmulti.com> Matthieu, I am not sure what exactly you mean. I did pass in "static" to the link-adviser and this is the new setup.cfg mkl_libs = mkl_solver_ilp64, mkl_intel_ilp64, mkl_gnu_thread, mkl_core. On import, Numpy complains as usual about the mkl_def and mkl_mc. If I append these libs, then the crashes happen on test() (complains first about the DGES* functions). Also, I have made sure that g77 is not installed and only gfortran is available. I also put in the LD_LIBRARY_PATH=/opt/intel/mkl/10.2.2.025/lib/em64t. Thanks, Ashwin Your message: Hi, You need to use the static libraries, are you sure you currently do? Matthieu 2009/10/15 Kashyap Ashwin : > I followed the advice given by the Intel MKL link adviser > (http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/) > > This is my new site.cfg: > mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core > > I also exported CFLAGS="-fopenmp" and built with the --fcompiler=gnu95. > Now I get these errors on import: > Running unit tests for numpy > NumPy version 1.3.0 > NumPy is installed in > /opt/Personalization/lib/python2.5/site-packages/numpy > Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4 > (Ubuntu 4.2.4-1ubuntu3)] > nose version 0.11.0 > > *** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined symbol: > mkl_dft_commit_descriptor_s_c2c_md_omp > *** libmkl_def.so *** failed with error : libmkl_def.so: undefined > symbol: mkl_dft_commit_descriptor_s_c2c_md_omp > MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so > > > Any hints? > > Thanks, > Ashwin > > > > Your message: > > On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin > wrote: >> Hello, >> I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) > with >> MKL. >> This is my site.cfg: >> [mkl] >> # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ >> library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t >> include_dirs = /opt/intel/mkl/10.2.2.025/include >> lapack_libs = mkl_lapack >> #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread, >> iomp5, mkl_vml_mc3 >> mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64, >> mkl_mc3, mkl_def > > The order does not look right - I don't know the exact order (each > version of the MKL changes the libraries), but you should respect the > order as given in the MKL manual. > >> MKL ERROR: Parameter 4 was incorrect on entry to DGESV > > This suggests an error when passing argument to MKL - I believe your > version of MKL uses the gfortran ABI by default, and hardy uses g77 as > the default fortran compiler. You should either recompile everything > with gfortran, or regenerate the MKL interface libraries with g77 (as > indicated in the manual). > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -----Original Message----- > From: Kashyap Ashwin > Sent: Thursday, October 15, 2009 11:01 AM > To: 'numpy-discussion at scipy.org' > Subject: RE: MKL with 64bit crashes > > I followed the advice given by the Intel MKL link adviser (http://software.intel.com/en- > us/articles/intel-mkl-link-line-advisor/) > > This is my new site.cfg: > mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core > > I also exported CFLAGS="-fopenmp" and built with the --fcompiler=gnu95. Now I get these errors on > import: > Running unit tests for numpy > NumPy version 1.3.0 > NumPy is installed in /opt/Personalization/lib/python2.5/site-packages/numpy > Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] > nose version 0.11.0 > > *** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined symbol: > mkl_dft_commit_descriptor_s_c2c_md_omp > *** libmkl_def.so *** failed with error : libmkl_def.so: undefined symbol: > mkl_dft_commit_descriptor_s_c2c_md_omp > MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so > > > Any hints? > > Thanks, > Ashwin > > > > Your message: > > On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin > wrote: > > Hello, > > I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) with > > MKL. > > This is my site.cfg: > > [mkl] > > # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/ > > library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t > > include_dirs = /opt/intel/mkl/10.2.2.025/include > > lapack_libs = mkl_lapack > > #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread, > > iomp5, mkl_vml_mc3 > > mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64, > > mkl_mc3, mkl_def > > The order does not look right - I don't know the exact order (each > version of the MKL changes the libraries), but you should respect the > order as given in the MKL manual. > > > MKL ERROR: Parameter 4 was incorrect on entry to DGESV > > This suggests an error when passing argument to MKL - I believe your > version of MKL uses the gfortran ABI by default, and hardy uses g77 as > the default fortran compiler. You should either recompile everything > with gfortran, or regenerate the MKL interface libraries with g77 (as > indicated in the manual). > > cheers, > > David From pgmdevlist at gmail.com Thu Oct 15 19:08:23 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 15 Oct 2009 19:08:23 -0400 Subject: [Numpy-discussion] genfromtxt documentation : review needed Message-ID: <31298ED8-7170-41B7-958E-F6E867DAA317@gmail.com> All, Here's a first draft for the documentation of np.genfromtxt. It took me longer than I thought, but that way I uncovered and fix some bugs. Please send me your comments/reviews/etc I count especially on our documentation specialist to let me know where to put it. Thx in advance P. -------------- next part -------------- A non-text attachment was scrubbed... Name: doc_genfromtxt.rst Type: application/octet-stream Size: 20428 bytes Desc: not available URL: From numpy at mspacek.mm.st Thu Oct 15 20:44:42 2009 From: numpy at mspacek.mm.st (Martin Spacek) Date: Fri, 16 Oct 2009 00:44:42 +0000 (UTC) Subject: [Numpy-discussion] intersect1d for N input arrays Message-ID: I have a list of many arrays (in my case each is unique, ie has no repeated elements), and I'd like to extract the intersection of all of them, all in one go. I'm running numpy 1.3.0, but looking at today's rev of numpy.lib.arraysetops (http://svn.scipy.org/svn/numpy/trunk/numpy/lib/arraysetops.py), I see intersect1d has changed. Just a note: the example used in the docstring implies that the two arrays need to be the same length, which isn't the case. Maybe it would be good to change the example to two arrays of different lengths. intersect1d takes exactly 2 arrays. I've modified it a little to take the intersection of any number of 1D arrays (of any length), in a list or tuple. It seems to work fine, but could use more testing. Here it is with most of the docs stripped. Feel free to use it, although I suppose for symmetry, many of the other functions in arraysetops.py would also have to be modified to work with N arrays: def intersect1d(arrays, assume_unique=False): """Find the intersection of any number of 1D arrays. Return the sorted, unique values that are in all of the input arrays. Adapted from numpy.lib.arraysetops.intersect1d""" N = len(arrays) arrays = list(arrays) # allow assignment if not assume_unique: for i, arr in enumerate(arrays): arrays[i] = np.unique(arr) aux = np.concatenate(arrays) # one long 1D array aux.sort() # sorted shift = N-1 return aux[aux[shift:] == aux[:-shift]] From david at ar.media.kyoto-u.ac.jp Thu Oct 15 23:25:51 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 16 Oct 2009 12:25:51 +0900 Subject: [Numpy-discussion] MKL with 64bit crashes In-Reply-To: <68DF70B3485CC648835655773E92314F9208C5@prinsmail02.am.thmulti.com> References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com> <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com> <68DF70B3485CC648835655773E92314F9208C5@prinsmail02.am.thmulti.com> Message-ID: <4AD7E7BF.2010509@ar.media.kyoto-u.ac.jp> Kashyap Ashwin wrote: > Matthieu, > I am not sure what exactly you mean. I did pass in "static" to the > link-adviser and this is the new setup.cfg > mkl_libs = mkl_solver_ilp64, mkl_intel_ilp64, mkl_gnu_thread, mkl_core. > > On import, Numpy complains as usual about the mkl_def and mkl_mc. If I > append these libs, then the crashes happen on test() (complains first > about the DGES* functions). > I remember now that I had the same problem recently - it is a fundamental incompatibility between MKL and Python way of loading shared libraries through dlopen. AFAIK, there is no solution to this problem, except for using the static libraries. David From dwf at cs.toronto.edu Fri Oct 16 00:14:01 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 16 Oct 2009 00:14:01 -0400 Subject: [Numpy-discussion] Google Groups archive? Message-ID: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu> Does anyone know what happened to the Google Groups archive of this list? when I try to access it, I see: Cannot find numpy-discussion The group named numpy-discussion has been removed because it violated Google's Terms Of Service. This seems exceedingly odd. Does anyone know _how_ we violated the ToS? David From pgmdevlist at gmail.com Fri Oct 16 00:20:48 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 16 Oct 2009 00:20:48 -0400 Subject: [Numpy-discussion] Google Groups archive? In-Reply-To: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu> References: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu> Message-ID: On Oct 16, 2009, at 12:14 AM, David Warde-Farley wrote: > Does anyone know what happened to the Google Groups archive of this > list? when I try to access it, I see: > > Cannot find numpy-discussion > The group named numpy-discussion has been removed because it violated > Google's Terms Of Service. > > This seems exceedingly odd. Does anyone know _how_ we violated the > ToS? Hit by spam-bots, most likely. Was it actively used, actually ? From josef.pktd at gmail.com Fri Oct 16 00:23:34 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 16 Oct 2009 00:23:34 -0400 Subject: [Numpy-discussion] Google Groups archive? In-Reply-To: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu> References: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu> Message-ID: <1cd32cbb0910152123ia15db57y277ec4d0442a2ee6@mail.gmail.com> On Fri, Oct 16, 2009 at 12:14 AM, David Warde-Farley wrote: > Does anyone know what happened to the Google Groups archive of this > list? when I try to access it, I see: > > Cannot find numpy-discussion > The group named numpy-discussion has been removed because it violated > Google's Terms Of Service. same question on october 5th > > This seems exceedingly odd. Does anyone know _how_ we violated the ToS? adult material on front page Who's the owner? Creating a new group would require a different name, since the old name is blocked, I tried. Josef > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav+sp at iki.fi Fri Oct 16 03:48:19 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Fri, 16 Oct 2009 07:48:19 +0000 (UTC) Subject: [Numpy-discussion] Google Groups archive? References: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu> <1cd32cbb0910152123ia15db57y277ec4d0442a2ee6@mail.gmail.com> Message-ID: Fri, 16 Oct 2009 00:23:34 -0400, josef.pktd wrote: [clip] >> This seems exceedingly odd. Does anyone know _how_ we violated the ToS? > > adult material on front page > > Who's the owner? Creating a new group would require a different name, > since the old name is blocked, I tried. Maybe it's best just not to use Google Groups. IMO, gmane.org offers an equivalent if not superior service. -- Pauli Virtanen From cimrman3 at ntc.zcu.cz Fri Oct 16 03:56:28 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Fri, 16 Oct 2009 09:56:28 +0200 Subject: [Numpy-discussion] intersect1d for N input arrays In-Reply-To: References: Message-ID: <4AD8272C.601@ntc.zcu.cz> Hi Martin, thanks for your ideas and contribution. A few notes: I would let intersect1d as it is, and created a new function with another name for that (any proposals?). Considering that most of arraysetops functions are based on sort, and in particular here that an intersection array is (usually) smaller than each of the input arrays, it might be better just to call intersect1d repeatedly for each array and the result of the previous call, accumulating the intersection. r. Martin Spacek wrote: > I have a list of many arrays (in my case each is unique, ie has no repeated > elements), and I'd like to extract the intersection of all of them, all in one > go. I'm running numpy 1.3.0, but looking at today's rev of numpy.lib.arraysetops > (http://svn.scipy.org/svn/numpy/trunk/numpy/lib/arraysetops.py), I see > intersect1d has changed. Just a note: the example used in the docstring implies > that the two arrays need to be the same length, which isn't the case. Maybe it > would be good to change the example to two arrays of different lengths. > > intersect1d takes exactly 2 arrays. I've modified it a little to take the > intersection of any number of 1D arrays (of any length), in a list or tuple. It > seems to work fine, but could use more testing. Here it is with most of the docs > stripped. Feel free to use it, although I suppose for symmetry, many of the > other functions in arraysetops.py would also have to be modified to work with N > arrays: > > > def intersect1d(arrays, assume_unique=False): > """Find the intersection of any number of 1D arrays. > Return the sorted, unique values that are in all of the input arrays. > Adapted from numpy.lib.arraysetops.intersect1d""" > N = len(arrays) > arrays = list(arrays) # allow assignment > if not assume_unique: > for i, arr in enumerate(arrays): > arrays[i] = np.unique(arr) > aux = np.concatenate(arrays) # one long 1D array > aux.sort() # sorted > shift = N-1 > return aux[aux[shift:] == aux[:-shift]] > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From faltet at pytables.org Fri Oct 16 06:07:10 2009 From: faltet at pytables.org (Francesc Alted) Date: Fri, 16 Oct 2009 12:07:10 +0200 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> Message-ID: <200910161207.11024.faltet@pytables.org> A Thursday 15 October 2009 19:00:04 Charles R Harris escrigu?: > > So, how to fix this? > > > > One obvious workaround is for users to pass "align=True" to the dtype > > constructor. This works if the dtype descriptor is a dictionary or > > comma-separated string. Is there a reason it couldn't be made to work > > with the string-of-tuples form that I'm missing? It would be marginally > > more convenient from my application, but that's just a finesse issue. > > > > However, perhaps we should try to fix the underlying alignment > > problems? Unfortunately, it's not clear to me how to resolve them > > without at least some performance penalty. You either do an alignment > > check of the pointer, and then memcpy if unaligned, or just always use > > memcpy. Not sure which is faster, as memcpy may have a fast path > > already. These are object arrays anyway, so there's plenty of overhead > > already, and I don't think this would affect regular numerical arrays. The response is clear: avoid memcpy() if you can. It is true that memcpy() performance has improved quite a lot in latest gcc (it has been quite good in Win versions since many years ago), but working with data in-place (i.e. avoiding a memory copy) is always faster (and most specially for large arrays that don't fit in cache processors). My own experiments says that, with an Intel Core2 processor the typical speed- ups for avoiding memcpy() are 2x. And I've read somewhere that both AMD and Intel are trying to make unaligned operations to go even faster in next architectures (the goal is that there should be no speed difference in accessing aligned or unaligned data). > I believe the memcpy approach is used for other unaligned parts of void > types. There is an inherent performance penalty there, but I don't see how > it can be avoided when using what are essentially packed structures. As to > memcpy, it's performance seems to depend on the compiler/compiler version, > old versions of gcc had *horrible* implementations of memcpy. I believe the > situation has since improved. However, I'm not sure we should be coding to > compiler issues unless it is unavoidable or the gain is huge. IMO, NumPy can be improved for unaligned data handling. For example, Numexpr is using this small snippet: from cpuinfo import cpu if cpu.is_AMD() or cpu.is_Intel(): is_cpu_amd_intel = True else: is_cpu_amd_intel = False for detecting AMD/Intel architectures and allowing the code to avoid memcpy() calls for the unaligned arrays. The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, which is distributed under NumPy, so it should not be too difficult to take advantage of this for avoiding unnecessary copies in this scenario. -- Francesc Alted From pav+sp at iki.fi Fri Oct 16 07:53:45 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Fri, 16 Oct 2009 11:53:45 +0000 (UTC) Subject: [Numpy-discussion] object array alignment issues References: <4AD75061.2020908@stsci.edu> <200910161207.11024.faltet@pytables.org> Message-ID: Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote: [clip] > IMO, NumPy can be improved for unaligned data handling. For example, > Numexpr is using this small snippet: > > from cpuinfo import cpu > if cpu.is_AMD() or cpu.is_Intel(): > is_cpu_amd_intel = True > else: > is_cpu_amd_intel = False > > for detecting AMD/Intel architectures and allowing the code to avoid > memcpy() calls for the unaligned arrays. > > The above code uses the excellent ``cpuinfo.py`` module from Pearu > Peterson, which is distributed under NumPy, so it should not be too > difficult to take advantage of this for avoiding unnecessary copies in > this scenario. I suppose this kind of check is easiest to do at compile-time, and defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for those architectures for which they are not necessary. -- Pauli Virtanen From cournape at gmail.com Fri Oct 16 08:02:03 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 16 Oct 2009 21:02:03 +0900 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> <200910161207.11024.faltet@pytables.org> Message-ID: <5b8d13220910160502x1827f6cdi777e22badcfa975f@mail.gmail.com> On Fri, Oct 16, 2009 at 8:53 PM, Pauli Virtanen wrote: > Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote: > [clip] >> IMO, NumPy can be improved for unaligned data handling. ?For example, >> Numexpr is using this small snippet: >> >> from cpuinfo import cpu >> if cpu.is_AMD() or cpu.is_Intel(): >> ? ? is_cpu_amd_intel = True >> else: >> ? ? is_cpu_amd_intel = False >> >> for detecting AMD/Intel architectures and allowing the code to avoid >> memcpy() calls for the unaligned arrays. >> >> The above code uses the excellent ``cpuinfo.py`` module from Pearu >> Peterson, which is distributed under NumPy, so it should not be too >> difficult to take advantage of this for avoiding unnecessary copies in >> this scenario. > > I suppose this kind of check is easiest to do at compile-time, and > defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for > those architectures for which they are not necessary. I wonder whether we could switch at runtime (import time) - it could be useful for testing. That being said, I agree that the cpu checks should be done at compile time - we had quite a few problems with cpuinfo in the past with new cpu/unhandled cpu, I think a compilation-based method is much more robust (and simpler) here. There are things where C is just much easier than python :) David From faltet at pytables.org Fri Oct 16 08:20:05 2009 From: faltet at pytables.org (Francesc Alted) Date: Fri, 16 Oct 2009 14:20:05 +0200 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <5b8d13220910160502x1827f6cdi777e22badcfa975f@mail.gmail.com> References: <4AD75061.2020908@stsci.edu> <5b8d13220910160502x1827f6cdi777e22badcfa975f@mail.gmail.com> Message-ID: <200910161420.05552.faltet@pytables.org> A Friday 16 October 2009 14:02:03 David Cournapeau escrigu?: > On Fri, Oct 16, 2009 at 8:53 PM, Pauli Virtanen wrote: > > Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote: > > [clip] > > > >> IMO, NumPy can be improved for unaligned data handling. ?For example, > >> Numexpr is using this small snippet: > >> > >> from cpuinfo import cpu > >> if cpu.is_AMD() or cpu.is_Intel(): > >> ? ? is_cpu_amd_intel = True > >> else: > >> ? ? is_cpu_amd_intel = False > >> > >> for detecting AMD/Intel architectures and allowing the code to avoid > >> memcpy() calls for the unaligned arrays. > >> > >> The above code uses the excellent ``cpuinfo.py`` module from Pearu > >> Peterson, which is distributed under NumPy, so it should not be too > >> difficult to take advantage of this for avoiding unnecessary copies in > >> this scenario. > > > > I suppose this kind of check is easiest to do at compile-time, and > > defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for > > those architectures for which they are not necessary. > > I wonder whether we could switch at runtime (import time) - it could > be useful for testing. > > That being said, I agree that the cpu checks should be done at compile > time - we had quite a few problems with cpuinfo in the past with new > cpu/unhandled cpu, I think a compilation-based method is much more > robust (and simpler) here. There are things where C is just much > easier than python :) Agreed. I'm relaying in ``cpuinfo.py`` just because it provides what I need in an easy way. BTW, the detection of AMD/Intel (just the vendor) processors seems to work flawlessly for the platforms that I've checked (but I suppose that you are talking about other characteristics, like SSE version, etc). -- Francesc Alted From jsseabold at gmail.com Fri Oct 16 08:29:29 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 16 Oct 2009 08:29:29 -0400 Subject: [Numpy-discussion] genfromtxt documentation : review needed In-Reply-To: <31298ED8-7170-41B7-958E-F6E867DAA317@gmail.com> References: <31298ED8-7170-41B7-958E-F6E867DAA317@gmail.com> Message-ID: On Thu, Oct 15, 2009 at 7:08 PM, Pierre GM wrote: > All, > Here's a first draft for the documentation of np.genfromtxt. > It took me longer than I thought, but that way I uncovered and fix some > bugs. > Please send me your comments/reviews/etc > I count especially on our documentation specialist to let me know where to > put it. > Thx in advance > P. > Great work! I am especially glad to see the better documentation on missing values, as I didn't fully understand how to do this. A few small comments and a small attached diff with a few nitpicking grammatical changes and some of what's proposed below. On the actual function, I am wondering if white space shouldn't be stripped by default, or at least if we have fixed width columns. I ran into a problem recently, where I was reading in a lot of strings that were in a fixed width format and my 4 gb of memory were soon consumed. I also can't think of a case where I'd ever care about leading or trailing white space. I always get confused going back and forth from zero-indexed to non zero-indexed, which might not be a good enough reason to worry about this, but it might be helpful to explicitly say that skip_header is not zero-indexed, though it doesn't raise an exception if you try. data = "junk1,junk2,junk3\n1.2,1.5,1" from StringIO import StringIO import numpy as np d = np.genfromtxt(StringIO(data), delimiter=",", skip_header=0) In [5]: d Out[5]: array([[ NaN, NaN, NaN], [ 1.2, 1.5, 1. ]]) d = np.genfromtxt(StringIO(data), delimiter=",", skip_header=1) In [7]: d Out[7]: array([ 1.2, 1.5, 1. ]) d = np.genfromtxt(StringIO(data), delimiter=",", skip_header=-1) In [9]: d Out[9]: array([[ NaN, NaN, NaN], [ 1.2, 1.5, 1. ]]) Also, I don't know if this is even something that should be worried about in the io, but recarray names also can't start with a number to preserve attribute names look up, but I thought I would bring it up anyway, since I ran across this recently. data = "1var1,var2,var3\n1.2,1.5,1" d = np.recfromtxt(StringIO(data), dtype=float, delimiter=",", names=True) In [36]: d Out[36]: rec.array((1.2, 1.5, 1.0), dtype=[('1var1', '", line 1 d.1var1 ^ SyntaxError: invalid syntax In [38]: d.var2 Out[38]: array(1.5) In [39]: d['1var1'] Out[39]: array(1.2) I didn't know about being able to specify the dtype as a dict. That might be handy. Is there any way to cross-link to the dtype documentation in rst? I can't remember. That might be helpful to have. I never did figure out what the loose keyword did, but I guess it's not that important to me if I've never needed it. Cheers, Skipper -------------- next part -------------- 57c57 < By default, :func:`genfromtxt` assumes ``delimiter=None``, meaning that the line is splitted along white-spaces (including tabs) and that consecutive white-spaces are considered as a single white-space. --- > By default, :func:`genfromtxt` assumes ``delimiter=None``, meaning that the line is split along white spaces (including tabs) and that consecutive white spaces are considered as a single white space. 76c76 < By default, when a line is decomposed into a series of strings, the individual entries are not stripped of leading or tailing white spaces. --- > By default, when a line is decomposed into a series of strings, the individual entries are not stripped of leading or trailing white spaces. 129c129 < The values of this argument must be an integer which corresponds to the number of lines to skip at the beginning of the file, before any other action is performed. --- > The values of this argument must be an integer which corresponds to the number of lines to skip at the beginning of the file, before any other action is performed. Note that this is not zero-indexed so that the first line is 1. 147c147 < Acceptable values for the argument are a single integer or a sequence of integers corresponding to the indices of the columns to import. --- > An acceptable values for the argument is a single integer or a sequence of integers corresponding to the indices of the columns to import. 195c195 < This behavior may be changed by modifying the default mapper of the :class:`~numpi.lib._iotools.StringConverter` class --- > This behavior may be changed by modifying the default mapper of the :class:`~numpy.lib._iotools.StringConverter` class 343c343 < .. However, user-defined converters may rapidly become cumbersome to manage when --- > .. However, user-defined converters may rapidly become cumbersome to manage. 389c389 < Each key can be a column index or a column name, and the corresponding value should eb a single object. --- > Each key can be a column index or a column name, and the corresponding value should be a single object. From mdroe at stsci.edu Fri Oct 16 08:31:08 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Fri, 16 Oct 2009 08:31:08 -0400 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> <200910161207.11024.faltet@pytables.org> Message-ID: <4AD8678C.2030502@stsci.edu> On 10/16/2009 07:53 AM, Pauli Virtanen wrote: > Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote: > [clip] > >> IMO, NumPy can be improved for unaligned data handling. For example, >> Numexpr is using this small snippet: >> >> from cpuinfo import cpu >> if cpu.is_AMD() or cpu.is_Intel(): >> is_cpu_amd_intel = True >> else: >> is_cpu_amd_intel = False >> >> for detecting AMD/Intel architectures and allowing the code to avoid >> memcpy() calls for the unaligned arrays. >> >> The above code uses the excellent ``cpuinfo.py`` module from Pearu >> Peterson, which is distributed under NumPy, so it should not be too >> difficult to take advantage of this for avoiding unnecessary copies in >> this scenario. >> > I suppose this kind of check is easiest to do at compile-time, and > defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for > those architectures for which they are not necessary. > > That's close to the solution I'm arriving at. I'm thinking of adding a macro "DEREF_UNALIGNED_PYOBJECT_PTR" which would do the right thing depending on the type of architecture. There should be no impact on architectures that handle unaligned pointers, and slightly slower (but correct) performance on other architectures. Mike From sturla at molden.no Fri Oct 16 12:05:05 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 16 Oct 2009 18:05:05 +0200 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <200910161207.11024.faltet@pytables.org> References: <4AD75061.2020908@stsci.edu> <200910161207.11024.faltet@pytables.org> Message-ID: <4AD899B1.7010806@molden.no> Francesc Alted skrev: > The response is clear: avoid memcpy() if you can. It is true that memcpy() > performance has improved quite a lot in latest gcc (it has been quite good in > Win versions since many years ago), but working with data in-place (i.e. > avoiding a memory copy) is always faster (and most specially for large arrays > that don't fit in cache processors). > > My own experiments says that, with an Intel Core2 processor the typical speed- > ups for avoiding memcpy() are 2x. If the underlying array is strided, I have seen the opposite as well. "Copy-in copy-out" is a common optimization used by Fortran compilers when working with strided arrays. The catch is that the work array has to fit in cache for this to make any sence. Anyhow, you cannot use memcpy for this kind of optimization - it assumes both buffers are contiguous. But working with arrays directly instead of copies is not always the faster option. S.M. > And I've read somewhere that both AMD and > Intel are trying to make unaligned operations to go even faster in next > architectures (the goal is that there should be no speed difference in > accessing aligned or unaligned data). > > >> I believe the memcpy approach is used for other unaligned parts of void >> types. There is an inherent performance penalty there, but I don't see how >> it can be avoided when using what are essentially packed structures. As to >> memcpy, it's performance seems to depend on the compiler/compiler version, >> old versions of gcc had *horrible* implementations of memcpy. I believe the >> situation has since improved. However, I'm not sure we should be coding to >> compiler issues unless it is unavoidable or the gain is huge. >> > > IMO, NumPy can be improved for unaligned data handling. For example, Numexpr > is using this small snippet: > > from cpuinfo import cpu > if cpu.is_AMD() or cpu.is_Intel(): > is_cpu_amd_intel = True > else: > is_cpu_amd_intel = False > > for detecting AMD/Intel architectures and allowing the code to avoid memcpy() > calls for the unaligned arrays. > > The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, > which is distributed under NumPy, so it should not be too difficult to take > advantage of this for avoiding unnecessary copies in this scenario. > > From pgmdevlist at gmail.com Fri Oct 16 17:36:48 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 16 Oct 2009 17:36:48 -0400 Subject: [Numpy-discussion] genfromtxt documentation : review needed In-Reply-To: References: <31298ED8-7170-41B7-958E-F6E867DAA317@gmail.com> Message-ID: <744355DE-54EE-4AB3-91C4-6A10890A5637@gmail.com> On Oct 16, 2009, at 8:29 AM, Skipper Seabold wrote: > Great work! I am especially glad to see the better documentation on > missing values, as I didn't fully understand how to do this. A few > small comments and a small attached diff with a few nitpicking > grammatical changes and some of what's proposed below. Thanks. I took your modifications into account. > On the actual function, I am wondering if white space shouldn't be > stripped by default, or at least if we have fixed width columns. Well, I'd do the opposite: `autostrip=False` if we work with fixed- length delimiters, `autostrip=True` if we work with character delimiters. > I also can't think of a case where I'd ever care about > leading or trailing white space. having `autostrip=False` when dealing with spaces as delimiters is a feature that was explicitly requested a while ago, when I started working on the function. > I always get confused going back and forth from zero-indexed to non > zero-indexed, which might not be a good enough reason to worry about > this, but it might be helpful to explicitly say that skip_header is > not zero-indexed, though it doesn't raise an exception if you try. Took your comment into account, but I did state that `skip_header` expects a number of lines, not a line index. > Also, I don't know if this is even something that should be worried > about in the io, but recarray names also can't start with a number to > preserve attribute names look up, but I thought I would bring it up > anyway, since I ran across this recently. Good point. I'll patch NameValidator for that. > I didn't know about being able to specify the dtype as a dict. That > might be handy. Is there any way to cross-link to the dtype > documentation in rst? I can't remember. That might be helpful to > have. Hence my call to the doc specialists. > I never did figure out what the loose keyword did, but I guess it's > not that important to me if I've never needed it. Oh yes, this one. Well, a StringConverter can either returns the default if it can't convert the string (loose=True) or raise an exception if it can't convert the string and the string is not part of the missing_values list of this StringConverter (loose=False). I need to add a couple of examples here. From numpy at mspacek.mm.st Fri Oct 16 18:01:54 2009 From: numpy at mspacek.mm.st (Martin Spacek) Date: Fri, 16 Oct 2009 22:01:54 +0000 (UTC) Subject: [Numpy-discussion] intersect1d for N input arrays References: <4AD8272C.601@ntc.zcu.cz> Message-ID: Robert Cimrman ntc.zcu.cz> writes: > > Hi Martin, > > thanks for your ideas and contribution. > > A few notes: I would let intersect1d as it is, and created a new function with another name for that (any > proposals?). Considering that most of arraysetops functions are based on sort, and in particular here > that an intersection array is (usually) smaller than each of the input arrays, it might be better just to > call intersect1d repeatedly for each array and the result of the previous call, accumulating the intersection. > > r. Hi Robert, Yeah, I suppose sorting will get progressively slower the more input arrays there are, and the longer each one gets. There's probably some crossover point where the cost of doing a Python loop over the input arrays to accumulate the intersection is less than the cost of doing a big sort. That would take some benchmarking... I forgot to handle the cases where the number of arrays passed is 0 or 1. Here's an updated version: def intersect1d(arrays, assume_unique=False): """Find the intersection of any number of 1D arrays. Return the sorted, unique values that are in all of the input arrays. Adapted from numpy.lib.arraysetops.intersect1d""" N = len(arrays) if N == 0: return np.asarray(arrays) arrays = list(arrays) # allow assignment if not assume_unique: for i, arr in enumerate(arrays): arrays[i] = np.unique(arr) aux = np.concatenate(arrays) # one long 1D array aux.sort() # sorted if N == 1: return aux shift = N-1 return aux[aux[shift:] == aux[:-shift]] From oliphant at enthought.com Fri Oct 16 23:35:13 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Fri, 16 Oct 2009 22:35:13 -0500 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <4AD75061.2020908@stsci.edu> References: <4AD75061.2020908@stsci.edu> Message-ID: On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote: > I recently committed a regression test and bugfix for object > pointers in > record arrays of unaligned size (meaning where each record is not a > multiple of sizeof(PyObject **)). > > For example: > > a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')]) > a2 = np.zeros((10,), 'S10') > # This copying would segfault > a1['o'] = a2 > > http://projects.scipy.org/numpy/ticket/1198 > > Unfortunately, this unit test has opened up a whole hornet's nest of > alignment issues on Solaris. The various reference counting functions > (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object > pointers, > for instance. Interestingly, there are comments in there saying > "handles misaligned data" (eg. line 190), but in fact it doesn't, and > doesn't look to me like it would. But I won't rule out a mistake in > building it on my part. Thanks for this bug report. It would be very helpful if you could provide the line number where the code is giving a bus error and explain why you think the code in question does not handle misaligned data (it still seems like it should to me --- but perhaps I must be missing something --- I don't have a Solaris box to test on). Perhaps, the real problem is elsewhere (such as other places where the mistake of forgetting about striding needing to be aligned also before pursuing the fast alignment path that you pointed out in another place of code). This was the thinking for why the code (that I think is in question) should handle mis-aligned data: 1) pointers that are not aligned to the correct size need to be copied to an aligned memory area before being de-referenced. 2) static variables defined in a function will be aligned by the C compiler. So, what the code in refcnt.c does is to copy the value in the NumPy data-area (i.e. pointed to by it->dataptr) to another memory location (the stack variable temp), dereference it and then increment it's reference count. 196: temp = (PyObject **)it->dataptr; 197: Py_XINCREF(*temp); I'm puzzled why this should fail. The stack trace showing where this fails would be very useful in figuring out what to fix. This is all independent of defining a variable to decide whether or not to even care about worrying about un-aligned data (which we could avoid worrying about on Intel and AMD). I'm all in favor of such a flag if it would speed up code, but I don't see it as the central issue here. Any more details about the bug you have found would be greatly appreciated. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 17 00:25:04 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 16 Oct 2009 22:25:04 -0600 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> Message-ID: On Fri, Oct 16, 2009 at 9:35 PM, Travis Oliphant wrote: > > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote: > > I recently committed a regression test and bugfix for object pointers in > record arrays of unaligned size (meaning where each record is not a > multiple of sizeof(PyObject **)). > > For example: > > a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')]) > a2 = np.zeros((10,), 'S10') > # This copying would segfault > a1['o'] = a2 > > http://projects.scipy.org/numpy/ticket/1198 > > Unfortunately, this unit test has opened up a whole hornet's nest of > alignment issues on Solaris. The various reference counting functions > (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, > for instance. Interestingly, there are comments in there saying > "handles misaligned data" (eg. line 190), but in fact it doesn't, and > doesn't look to me like it would. But I won't rule out a mistake in > building it on my part. > > > Thanks for this bug report. It would be very helpful if you could > provide the line number where the code is giving a bus error and explain why > you think the code in question does not handle misaligned data (it still > seems like it should to me --- but perhaps I must be missing something --- I > don't have a Solaris box to test on). Perhaps, the real problem is > elsewhere (such as other places where the mistake of forgetting about > striding needing to be aligned also before pursuing the fast alignment path > that you pointed out in another place of code). > > This was the thinking for why the code (that I think is in question) should > handle mis-aligned data: > > 1) pointers that are not aligned to the correct size need to be copied to > an aligned memory area before being de-referenced. > 2) static variables defined in a function will be aligned by the C > compiler. > > So, what the code in refcnt.c does is to copy the value in the NumPy > data-area (i.e. pointed to by it->dataptr) to another memory location (the > stack variable temp), dereference it and then increment it's reference > count. > > 196: temp = (PyObject **)it->dataptr; > 197: Py_XINCREF(*temp); > Doesn't it->dataptr need to be copied to temp, not just assigned? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Sat Oct 17 07:20:38 2009 From: faltet at pytables.org (Francesc Alted) Date: Sat, 17 Oct 2009 13:20:38 +0200 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <4AD899B1.7010806@molden.no> References: <4AD75061.2020908@stsci.edu> <200910161207.11024.faltet@pytables.org> <4AD899B1.7010806@molden.no> Message-ID: <200910171320.38833.faltet@pytables.org> A Friday 16 October 2009 18:05:05 Sturla Molden escrigu?: > Francesc Alted skrev: > > The response is clear: avoid memcpy() if you can. It is true that > > memcpy() performance has improved quite a lot in latest gcc (it has been > > quite good in Win versions since many years ago), but working with data > > in-place (i.e. avoiding a memory copy) is always faster (and most > > specially for large arrays that don't fit in cache processors). > > > > My own experiments says that, with an Intel Core2 processor the typical > > speed- ups for avoiding memcpy() are 2x. > > If the underlying array is strided, I have seen the opposite as well. > "Copy-in copy-out" is a common optimization used by Fortran compilers > when working with strided arrays. The catch is that the work array has > to fit in cache for this to make any sence. Anyhow, you cannot use > memcpy for this kind of optimization - it assumes both buffers are > contiguous. But working with arrays directly instead of copies is not > always the faster option. Mmh, don't know about Fortran (too many years without programming it), but in C it seems evident that performing a memcpy() is always slower, at least with modern CPUs (like the Intel Core2 that I'm using now): In [43]: import numpy as np In [44]: import numexpr as ne In [45]: r = np.zeros(1e6, 'i1,i4,f8') In [46]: f1, f2 = r['f1'], r['f2'] In [47]: f1.flags.aligned, f2.flags.aligned Out[47]: (False, False) In [48]: timeit f1*f2 # NumPy do copies before carrying out operations 100 loops, best of 3: 14.6 ms per loop In [49]: timeit ne.evaluate('f1*f2') # numexpr uses plain unaligned access 100 loops, best of 3: 5.77 ms per loop # 2.5x faster than numpy Using strides, the result is similar: In [50]: f1, f2 = r['f1'][::2], r['f2'][::2] # check with strides In [51]: f1.flags.aligned, f2.flags.aligned Out[51]: (False, False) In [52]: timeit f1*f2 100 loops, best of 3: 7.52 ms per loop In [53]: timeit ne.evaluate('f1*f2') 100 loops, best of 3: 3.96 ms per loop # 1.9x faster than numpy And, when using large strides so that the resulting arrays fit in cache: In [54]: f1, f2 = r['f1'][::10], r['f2'][::10] # big stride (fits in cache) In [55]: timeit f1*f2 100 loops, best of 3: 3.51 ms per loop In [56]: timeit ne.evaluate('f1*f2') 100 loops, best of 3: 2.61 ms per loop # 34% faster than numpy Which, although not much, still gives an advantage to the direct approach. So, at least in C, operating with unaligned data on (modern) AMD/Intel processors seems to be fastest (at least in this quick-and-dirty benchmark). In fact, performance is very close to contiguous and aligned data: In [58]: f1, f2 = r['f1'].copy(), r['f2'].copy() # aligned and contiguous In [59]: timeit f1*f2 100 loops, best of 3: 5.2 ms per loop In [60]: timeit ne.evaluate('f1*f2') 100 loops, best of 3: 4.74 ms per loop so 5.77 ms (unaligned data, In [49]) is not very far from 4.74 ms (aligned data, In [60]) and close to 'optimal' numpy performance (5.2 ms, In [59]). And, as I said before, the plans of AMD/Intel are to reduce this gap still further. For unaligned arrays that fits in cache the results are even more dramatic: In [61]: r = np.zeros(1e5, 'i1,i4,f8') In [62]: f1, f2 = r['f1'], r['f2'] In [63]: timeit f1*f2 1000 loops, best of 3: 1.37 ms per loop In [64]: timeit ne.evaluate('f1*f2') 1000 loops, best of 3: 293 ?s per loop # 4.7x speedup but not sure why... Cheers, -- Francesc Alted From dsdale24 at gmail.com Sat Oct 17 08:49:11 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Sat, 17 Oct 2009 08:49:11 -0400 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic Message-ID: numpy's functions, especially ufuncs, have had some ability to support subclasses through the ndarray.__array_wrap__ method, which provides masked arrays or quantities (for example) with an opportunity to set the class and metadata of the output array at the end of an operation. An example is q1 = Quantity(1, 'meter') q2 = Quantity(2, 'meters') numpy.add(q1, q2) # yields Quantity(3, 'meters') At SciPy2009 we committed a change to the numpy trunk that provides a chance to determine the class and some metadata of the output *before* the ufunc performs its calculation, but after output array has been established (and its data is still uninitialized). Consider: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'J') numpy.add(q1, q2, q1) # or equivalently: # q1 += q2 With only __array_wrap__, the attempt to propagate the units happens after q1's data was updated in place, too late to raise an error, the data is now corrupted. __array_prepare__ solves that problem, an exception can be raised in time. Now I'd like to suggest one more improvement to numpy to make its functions more generic. Consider one more example: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'feet') numpy.add(q1, q2) In this case, I'd like an opportunity to operate on the input arrays on the way in to the ufunc, to rescale the second input to meters. I think it would be a hack to try to stuff this capability into __array_prepare__. One form of this particular example is already supported in quantities, "q1 + q2", by overriding the __add__ method to rescale the second input, but there are ufuncs that do not have an associated special method. So I'd like to look into adding another check for a special method, perhaps called __input_prepare__. My time is really tight for the next month, so I'd rather not start if there are strong objections, but otherwise, I'd like to try to try to get it in in time for numpy-1.4. (Has a timeline been established?) I think it will be not too difficult to document this overall scheme: When calling numpy functions: 1) __input_prepare__ provides an opportunity to operate on the inputs to yield versions that are compatible with the operation (they should obviously not be modified in place) 2) the output array is established 3) __array_prepare__ is used to determine the class of the output array, as well as any metadata that needs to be established before the operation proceeds 4) the ufunc performs its operations 5) __array_wrap__ provides an opportunity to update the output array based on the results of the computation Comments, criticisms? If PEP 3124^ were already a part of the standard library, that could serve as the basis for generalizing numpy's functions. But I think the PEP will not be approved in its current form, and it is unclear when and if the author will revisit the proposal. The scheme I'm imagining might be sufficient for our purposes. Darren ^ http://www.python.org/dev/peps/pep-3124/ From berthe.loic at gmail.com Sat Oct 17 11:13:57 2009 From: berthe.loic at gmail.com (=?ISO-8859-1?Q?Lo=EFc_BERTHE?=) Date: Sat, 17 Oct 2009 17:13:57 +0200 Subject: [Numpy-discussion] Subclassing record array Message-ID: Hi, I would like to create my own class of record array to deal with units. Here is the code I used, inspired from http://docs.scipy.org/doc/numpy-1.3.x/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array : [code] from numpy import * class BlocArray(rec.ndarray): """ Recarray with units and pretty print """ fmt_dict = {'S' : '%10s', 'f' : '%10.6G', 'i': '%10d'} def __new__(cls, data, titles=None, units=None): # guess format for each column data2 = [] for line in zip(*data) : try : data2.append(cast[int](line)) # integers except ValueError : try : data2.append(cast[float](line)) # reals except ValueError : data2.append(cast[str](line)) # characters # create the array dt = dtype(zip(titres, [line.dtype for line in data2])) obj = rec.array(data2, dtype=dt).view(cls) # add custom attributes obj.units = units or [] obj._fmt = " ".join(obj.fmt_dict[d[1][1]] for d in dt.descr) + '\n' obj._head = "%10s "*len(dt.names) % dt.names +'\n' obj._head += "%10s "*len(dt.names) % tuple('(%s)' % u for u in units) +'\n' # Finally, we must return the newly created object: return obj titles = ['Name', 'Nb', 'Price'] units = ['/', '/', 'Eur'] data = [['fish', '1', '12.25'], ['egg', '6', '0.85'], ['TV', 1, '125']] bloc = BlocArray(data, titles=titles, units=units) In [544]: bloc Out[544]: Name Nb Price (/) (/) (Eur) fish 1 12.25 egg 6 0.85 TV 1 125 [/code] It's almost working, but I have some isues : - I can't access data through indexing In [563]: bloc['Price'] /home/loic/Python/numpy/test.py in ((r,)) 50 51 def __repr__(self): ---> 52 return self._head + ''.join(self._fmt % tuple(r) for r in self) TypeError: 'numpy.float64' object is not iterable So I think that overloading the __repr__ method is not that easy - I can't access data through attributes now : In [564]: bloc.Nb AttributeError: 'BlocArray' object has no attribute 'Nb' - I can't use 'T' as field in theses array as the T method is already here as a shortcut for transpose Have you any hints to make this work ? -- LB From perfreem at gmail.com Sat Oct 17 11:36:26 2009 From: perfreem at gmail.com (per freem) Date: Sat, 17 Oct 2009 11:36:26 -0400 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) Message-ID: hi all, in my code, i use the function 'logsumexp' from scipy.maxentropy a lot. as far as i can tell, this function has no vectorized version that works on an m-x-n matrix. i might be doing something wrong here, but i found that this function can run extremely slowly if used as follows: i have an array of log probability vectors, such that each column sums to one. i want to simply iterate over each column and renormalize it, using exp(col - logsumexp(col)). here is the code that i used to profile this operation: from scipy import * from numpy import * from numpy.random.mtrand import dirichlet from scipy.maxentropy import logsumexp import time # build an array of probability vectors. each column represents a probability vector. num_vectors = 1000000 log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors))) # now renormalize each column, using logsumexp norm_prob_vectors = [] t1 = time.time() for n in range(num_vectors): norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n])) norm_prob_vectors.append(norm_p) t2 = time.time() norm_prob_vectors = array(norm_prob_vectors) print "logsumexp renormalization (%d many times) took %s seconds." %(num_vectors, str(t2-t1)) i found that even with only 100,000 elements, this code takes about 5 seconds: logsumexp renormalization (100000 many times) took 5.07085394859 seconds. with 1 million elements, it becomes prohibitively slow: logsumexp renormalization (1000000 many times) took 70.7815010548 seconds. is there a way to speed this up? most vectorized operations that work on matrices in numpy/scipy are incredibly fast and it seems like a vectorized version of logsumexp should be near instant on this scale. is there a way to rewrite the above snippet so that it's faster? thanks very much for your help. From kwgoodman at gmail.com Sat Oct 17 11:48:37 2009 From: kwgoodman at gmail.com (Keith Goodman) Date: Sat, 17 Oct 2009 08:48:37 -0700 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: References: Message-ID: On Sat, Oct 17, 2009 at 8:36 AM, per freem wrote: > hi all, > > in my code, i use the function 'logsumexp' from scipy.maxentropy a > lot. as far as i can tell, this function has no vectorized version > that works on an m-x-n matrix. i might be doing something wrong here, > but i found that this function can run extremely slowly if used as > follows: i have an array of log probability vectors, such that each > column sums to one. i want to simply iterate over each column and > renormalize it, using exp(col - logsumexp(col)). here is the code that > i used to profile this operation: > > from scipy import * > from numpy import * > from numpy.random.mtrand import dirichlet > from scipy.maxentropy import logsumexp > import time > > # build an array of probability vectors. ?each column represents a > probability vector. > num_vectors = 1000000 > log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors))) > # now renormalize each column, using logsumexp > norm_prob_vectors = [] > t1 = time.time() > for n in range(num_vectors): > ? ?norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n])) > ? ?norm_prob_vectors.append(norm_p) > t2 = time.time() > norm_prob_vectors = array(norm_prob_vectors) > print "logsumexp renormalization (%d many times) took %s seconds." > %(num_vectors, str(t2-t1)) > > i found that even with only 100,000 elements, this code takes about 5 seconds: > > logsumexp renormalization (100000 many times) took 5.07085394859 seconds. > > with 1 million elements, it becomes prohibitively slow: > > logsumexp renormalization (1000000 many times) took 70.7815010548 seconds. > > is there a way to speed this up? most vectorized operations that work > on matrices in numpy/scipy are incredibly fast and it seems like a > vectorized version of logsumexp should be near instant on this scale. > is there a way to rewrite the above snippet so that it's faster? > > thanks very much for your help. Here's logsumexp from scipy: def logsumexp(a): a = asarray(a) a_max = a.max() return a_max + log((exp(a-a_max)).sum()) Would this work: def logsumexp2(a): a = asarray(a) a_max = a.max(axis=0) return a_max + log((exp(a-a_max)).sum(axis=0)) ? From charlesr.harris at gmail.com Sat Oct 17 13:20:50 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 17 Oct 2009 11:20:50 -0600 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: References: Message-ID: On Sat, Oct 17, 2009 at 9:36 AM, per freem wrote: > hi all, > > in my code, i use the function 'logsumexp' from scipy.maxentropy a > lot. as far as i can tell, this function has no vectorized version > that works on an m-x-n matrix. i might be doing something wrong here, > but i found that this function can run extremely slowly if used as > follows: i have an array of log probability vectors, such that each > column sums to one. i want to simply iterate over each column and > renormalize it, using exp(col - logsumexp(col)). here is the code that > i used to profile this operation: > > from scipy import * > from numpy import * > from numpy.random.mtrand import dirichlet > from scipy.maxentropy import logsumexp > import time > > Why aren't you using logaddexp ufunc from numpy? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Oct 17 13:54:52 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 17 Oct 2009 13:54:52 -0400 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: References: Message-ID: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris wrote: > > > On Sat, Oct 17, 2009 at 9:36 AM, per freem wrote: >> >> hi all, >> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a >> lot. as far as i can tell, this function has no vectorized version >> that works on an m-x-n matrix. i might be doing something wrong here, >> but i found that this function can run extremely slowly if used as >> follows: i have an array of log probability vectors, such that each >> column sums to one. i want to simply iterate over each column and >> renormalize it, using exp(col - logsumexp(col)). here is the code that >> i used to profile this operation: >> >> from scipy import * >> from numpy import * >> from numpy.random.mtrand import dirichlet >> from scipy.maxentropy import logsumexp >> import time >> > > Why aren't you using logaddexp ufunc from numpy? Maybe because it is difficult to find, it doesn't have its own docs entry. e.g. no link to logaddexp in http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations I have no idea, why it is different from the other ufuncs in the docs (and help file). It shows up correctly in the docs editor, but not in the numpy 1.3 and online docs. Josef > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Sat Oct 17 14:02:37 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 17 Oct 2009 12:02:37 -0600 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> Message-ID: On Sat, Oct 17, 2009 at 11:54 AM, wrote: > On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris > wrote: > > > > > > On Sat, Oct 17, 2009 at 9:36 AM, per freem wrote: > >> > >> hi all, > >> > >> in my code, i use the function 'logsumexp' from scipy.maxentropy a > >> lot. as far as i can tell, this function has no vectorized version > >> that works on an m-x-n matrix. i might be doing something wrong here, > >> but i found that this function can run extremely slowly if used as > >> follows: i have an array of log probability vectors, such that each > >> column sums to one. i want to simply iterate over each column and > >> renormalize it, using exp(col - logsumexp(col)). here is the code that > >> i used to profile this operation: > >> > >> from scipy import * > >> from numpy import * > >> from numpy.random.mtrand import dirichlet > >> from scipy.maxentropy import logsumexp > >> import time > >> > > > > Why aren't you using logaddexp ufunc from numpy? > > Maybe because it is difficult to find, it doesn't have its own docs entry. > > e.g. no link to logaddexp in > > http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations > > I have no idea, why it is different from the other ufuncs in the docs > (and help file). > It shows up correctly in the docs editor, but not in the numpy 1.3 and > online docs. > > That's curious, none of the five ufuncs added in 1.3 have links even though they all have documentation. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.ginsburg at colorado.edu Sat Oct 17 14:08:18 2009 From: adam.ginsburg at colorado.edu (Adam Ginsburg) Date: Sat, 17 Oct 2009 12:08:18 -0600 Subject: [Numpy-discussion] double-precision sqrt? Message-ID: Hi folks, I'm trying to write a ray-tracing code for which high precision is required. I also "need" to use square roots. However, math.sqrt and numpy.sqrt seem to only use single-precision floats. Is there a simple way to make sqrt use higher precision? Alternately, am I simply being obtuse? Thanks, Adam Example code: from scipy.optimize.minpack import fsolve from numpy import sqrt sqrt(float64(1.034324523462345)) # 1.0170174646791199 f=lambda x: x**2-float64(1.034324523462345)**2 f(sqrt(float64(1.034324523462345))) # -0.03550269637326231 fsolve(f,1.01) # 1.0343245234623459 f(fsolve(f,1.01)) # 1.7763568394002505e-15 fsolve(f,1.01) - sqrt(float64(1.034324523462345)) # 0.017307058783226026 From adam.ginsburg at colorado.edu Sat Oct 17 14:17:29 2009 From: adam.ginsburg at colorado.edu (Adam Ginsburg) Date: Sat, 17 Oct 2009 12:17:29 -0600 Subject: [Numpy-discussion] double-precision sqrt? In-Reply-To: References: Message-ID: My code is actually wrong.... but I still have the problem I've identified that sqrt is leading to precision errors. Sorry about the earlier mistake. Adam On Sat, Oct 17, 2009 at 12:08 PM, Adam Ginsburg wrote: > > sqrt(float64(1.034324523462345)) > # 1.0170174646791199 > f=lambda x: x**2-float64(1.034324523462345)**2 should be f=lambda x: x**2-float64(1.034324523462345) so the code I sent was not a legitimate test. From dagss at student.matnat.uio.no Sat Oct 17 14:25:01 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 17 Oct 2009 20:25:01 +0200 Subject: [Numpy-discussion] double-precision sqrt? In-Reply-To: References: Message-ID: <4ADA0BFD.2090603@student.matnat.uio.no> Adam Ginsburg wrote: > Hi folks, > I'm trying to write a ray-tracing code for which high precision is > required. I also "need" to use square roots. However, math.sqrt and > numpy.sqrt seem to only use single-precision floats. Is there a > simple way to make sqrt use higher precision? Alternately, am I > simply being obtuse? How are you actually using the results of sqrt? When printing the results you may not get the full precision...try e.g. print "%.50f" % np.sqrt(np.float64( 1.034324523462345)) -- Dag Sverre From nadavh at visionsense.com Sat Oct 17 14:27:02 2009 From: nadavh at visionsense.com (Nadav Horesh) Date: Sat, 17 Oct 2009 20:27:02 +0200 Subject: [Numpy-discussion] double-precision sqrt? References: Message-ID: <710F2847B0018641891D9A21602763605AD1C8@ex3.envision.co.il> The default precision is double unless yue specify otherwise (float32 or long double (float128 or float96)) You can see this from: f(fsolve(f,1.01)) # 1.7763568394002505e-15 The last line should be: >>> fsolve(f,1.01) - float64(1.034324523462345) 8.8817841970012523e-16 Nadav -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? Adam Ginsburg ????: ? 17-???????-09 20:08 ??: numpy-discussion at scipy.org ????: [Numpy-discussion] double-precision sqrt? Hi folks, I'm trying to write a ray-tracing code for which high precision is required. I also "need" to use square roots. However, math.sqrt and numpy.sqrt seem to only use single-precision floats. Is there a simple way to make sqrt use higher precision? Alternately, am I simply being obtuse? Thanks, Adam Example code: from scipy.optimize.minpack import fsolve from numpy import sqrt sqrt(float64(1.034324523462345)) # 1.0170174646791199 f=lambda x: x**2-float64(1.034324523462345)**2 f(sqrt(float64(1.034324523462345))) # -0.03550269637326231 fsolve(f,1.01) # 1.0343245234623459 f(fsolve(f,1.01)) # 1.7763568394002505e-15 fsolve(f,1.01) - sqrt(float64(1.034324523462345)) # 0.017307058783226026 _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3358 bytes Desc: not available URL: From charlesr.harris at gmail.com Sat Oct 17 14:31:14 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 17 Oct 2009 12:31:14 -0600 Subject: [Numpy-discussion] double-precision sqrt? In-Reply-To: References: Message-ID: On Sat, Oct 17, 2009 at 12:08 PM, Adam Ginsburg wrote: > Hi folks, > I'm trying to write a ray-tracing code for which high precision is > required. I also "need" to use square roots. However, math.sqrt and > numpy.sqrt seem to only use single-precision floats. Is there a > simple way to make sqrt use higher precision? Alternately, am I > simply being obtuse? > > Thanks, > Adam > > Example code: > from scipy.optimize.minpack import fsolve > from numpy import sqrt > > sqrt(float64(1.034324523462345)) > # 1.0170174646791199 > f=lambda x: x**2-float64(1.034324523462345)**2 > > f(sqrt(float64(1.034324523462345))) > # -0.03550269637326231 > > fsolve(f,1.01) > # 1.0343245234623459 > > f(fsolve(f,1.01)) > # 1.7763568394002505e-15 > > fsolve(f,1.01) - sqrt(float64(1.034324523462345)) > # 0.017307058783226026 > ____ The routines *are* in double precision, but why are you using fsolve? optimize.zeros.brentq would probably be a better choice. Also, you are using differences with squared terms that will lose you precision. The last time I wrote a ray tracing package was 30 years ago, but I think much will depend on how you represent the curved surfaces. IIRC, there were also a lot of quadratic equations to solve, and using the correct formula for the root you want (the usual formula is poor for at least one of the roots) will also make a difference. In other words, you probably need to take a lot of care with how you set up the problem in order to minimize roundoff error. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Sat Oct 17 14:37:18 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sat, 17 Oct 2009 14:37:18 -0400 Subject: [Numpy-discussion] double-precision sqrt? In-Reply-To: References: Message-ID: 2009/10/17 Adam Ginsburg : > My code is actually wrong.... but I still have the problem I've > identified that sqrt is leading to precision errors. ?Sorry about the > earlier mistake. I think you'll find that numpy's sqrt is as good as it gets for double precision. You can try using numpy's float96 type, which at least on my machine, does give sa few more significant figures. If you really, really need accuracy, there are arbitrary-precision packages for python, which you could try. But I think you may find that your problem is not solved by higher precision. Something about ray-tracing just leads it to ferret out every little limitation of floating-point computation. For example, you can easily get "surface acne" when shooting shadow rays, where a ray shot from a surface to a light source accidentally intersects that same surface for some pixels but not for others. You can try to fix it by requiring some minimum intersection distance, but then you'll find lots of weird little quirks where your minimum distance causes problems. A better solution is one which takes into account the awkwardness of floating-point; for this particular case one trick is to mark the object you're shooting rays from as not a candidate for intersection. (This doesn't work, of course, if the object can cast shadows on itself...) I have even seen people advocate for using interval analysis inside ray tracers, to avoid this kind of problem. Anne From ndbecker2 at gmail.com Sat Oct 17 14:40:16 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Sat, 17 Oct 2009 14:40:16 -0400 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> Message-ID: Somewhat offtopic, but is there a generalization of the logsumexp shortcut to more than 2 variables? IIRC, it's this for 2 variables: log (exp (a) + exp (b)) = max (a,b) + log (1 + exp (-abs (a-b))) From charlesr.harris at gmail.com Sat Oct 17 14:59:41 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 17 Oct 2009 12:59:41 -0600 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> Message-ID: On Sat, Oct 17, 2009 at 12:40 PM, Neal Becker wrote: > Somewhat offtopic, but is there a generalization of the logsumexp shortcut > to more than 2 variables? > > IIRC, it's this for 2 variables: > log (exp (a) + exp (b)) = max (a,b) + log (1 + exp (-abs (a-b))) > > logaddexp.reduce will apply it along array rows. The reduce loop could probably be optimized a bit using the methods that Dale used to optimize the reduce case for add. Hmm, the reduce loop would need to be implemented for the generic loops. The logaddexp case could possibly be optimized further by writing a specialized loop for the reduce case. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 17 15:02:09 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 17 Oct 2009 13:02:09 -0600 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> Message-ID: On Sat, Oct 17, 2009 at 12:59 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Sat, Oct 17, 2009 at 12:40 PM, Neal Becker wrote: > >> Somewhat offtopic, but is there a generalization of the logsumexp shortcut >> to more than 2 variables? >> >> IIRC, it's this for 2 variables: >> log (exp (a) + exp (b)) = max (a,b) + log (1 + exp (-abs (a-b))) >> >> > logaddexp.reduce will apply it along array rows. The reduce loop could > probably be optimized a bit using the methods that Dale used to optimize the > reduce case for add. Hmm, the reduce loop would need to be implemented for > the generic loops. The logaddexp case could possibly be optimized further by > writing a specialized loop for the reduce case. > > Example: In [1]: x = arange(9).reshape(3,3) In [2]: logaddexp.reduce(x, axis=1) Out[2]: array([ 2.40760596, 5.40760596, 8.40760596]) In [3]: logaddexp.reduce(x, axis=0) Out[3]: array([ 6.05094576, 7.05094576, 8.05094576]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.ginsburg at colorado.edu Sat Oct 17 18:03:08 2009 From: adam.ginsburg at colorado.edu (Adam Ginsburg) Date: Sat, 17 Oct 2009 16:03:08 -0600 Subject: [Numpy-discussion] double-precision sqrt? In-Reply-To: References: Message-ID: Hi again, I apologize, the mistake was entirely my own. Sqrt's do the right thing.... Adam On Sat, Oct 17, 2009 at 12:17 PM, Adam Ginsburg wrote: > My code is actually wrong.... but I still have the problem I've > identified that sqrt is leading to precision errors. ?Sorry about the > earlier mistake. > > Adam > > On Sat, Oct 17, 2009 at 12:08 PM, Adam Ginsburg > wrote: >> >> sqrt(float64(1.034324523462345)) >> # 1.0170174646791199 >> f=lambda x: x**2-float64(1.034324523462345)**2 > > should be > f=lambda x: x**2-float64(1.034324523462345) > > so the code I sent was not a legitimate test. > From charlesr.harris at gmail.com Sat Oct 17 18:45:40 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 17 Oct 2009 16:45:40 -0600 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: On Sat, Oct 17, 2009 at 6:49 AM, Darren Dale wrote: > numpy's functions, especially ufuncs, have had some ability to support > subclasses through the ndarray.__array_wrap__ method, which provides > masked arrays or quantities (for example) with an opportunity to set > the class and metadata of the output array at the end of an operation. > An example is > > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'meters') > numpy.add(q1, q2) # yields Quantity(3, 'meters') > > At SciPy2009 we committed a change to the numpy trunk that provides a > chance to determine the class and some metadata of the output *before* > the ufunc performs its calculation, but after output array has been > established (and its data is still uninitialized). Consider: > > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'J') > numpy.add(q1, q2, q1) > # or equivalently: > # q1 += q2 > > With only __array_wrap__, the attempt to propagate the units happens > after q1's data was updated in place, too late to raise an error, the > data is now corrupted. __array_prepare__ solves that problem, an > exception can be raised in time. > > Now I'd like to suggest one more improvement to numpy to make its > functions more generic. Consider one more example: > > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'feet') > numpy.add(q1, q2) > > In this case, I'd like an opportunity to operate on the input arrays > on the way in to the ufunc, to rescale the second input to meters. I > think it would be a hack to try to stuff this capability into > __array_prepare__. One form of this particular example is already > supported in quantities, "q1 + q2", by overriding the __add__ method > to rescale the second input, but there are ufuncs that do not have an > associated special method. So I'd like to look into adding another > check for a special method, perhaps called __input_prepare__. My time > is really tight for the next month, so I'd rather not start if there > are strong objections, but otherwise, I'd like to try to try to get it > in in time for numpy-1.4. (Has a timeline been established?) > > I think it will be not too difficult to document this overall scheme: > > When calling numpy functions: > > 1) __input_prepare__ provides an opportunity to operate on the inputs > to yield versions that are compatible with the operation (they should > obviously not be modified in place) > > 2) the output array is established > > 3) __array_prepare__ is used to determine the class of the output > array, as well as any metadata that needs to be established before the > operation proceeds > > 4) the ufunc performs its operations > > 5) __array_wrap__ provides an opportunity to update the output array > based on the results of the computation > > Comments, criticisms? If PEP 3124^ were already a part of the standard > library, that could serve as the basis for generalizing numpy's > functions. But I think the PEP will not be approved in its current > form, and it is unclear when and if the author will revisit the > proposal. The scheme I'm imagining might be sufficient for our > purposes. > > This sounds interesting to me, as it would push the use of array wrap down into a common function and make it easier to use. I wonder what the impact would be on the current subclasses of ndarray? On a side note, I wonder if you could look into adding your reduce loop optimizations into the generic loops? It would be interesting to see if that speeded up some common operations. In any case, it can't hurt. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Oct 17 19:27:55 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 17 Oct 2009 19:27:55 -0400 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> Message-ID: <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> On Sat, Oct 17, 2009 at 2:02 PM, Charles R Harris wrote: > > > On Sat, Oct 17, 2009 at 11:54 AM, wrote: >> >> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris >> wrote: >> > >> > >> > On Sat, Oct 17, 2009 at 9:36 AM, per freem wrote: >> >> >> >> hi all, >> >> >> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a >> >> lot. as far as i can tell, this function has no vectorized version >> >> that works on an m-x-n matrix. i might be doing something wrong here, >> >> but i found that this function can run extremely slowly if used as >> >> follows: i have an array of log probability vectors, such that each >> >> column sums to one. i want to simply iterate over each column and >> >> renormalize it, using exp(col - logsumexp(col)). here is the code that >> >> i used to profile this operation: >> >> >> >> from scipy import * >> >> from numpy import * >> >> from numpy.random.mtrand import dirichlet >> >> from scipy.maxentropy import logsumexp >> >> import time >> >> >> > >> > Why aren't you using logaddexp ufunc from numpy? >> >> Maybe because it is difficult to find, it doesn't have its own docs entry. >> >> e.g. no link to logaddexp in >> >> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations >> >> I have no idea, why it is different from the other ufuncs in the docs >> (and help file). >> It shows up correctly in the docs editor, but not in the numpy 1.3 and >> online docs. >> > > That's curious, none of the five ufuncs added in 1.3 have links even though > they all have documentation. I found that they are missing from routines.math http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/ I added logaddexp, logaddexp2 and exp2 What else? Josef > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Sat Oct 17 19:46:05 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 17 Oct 2009 17:46:05 -0600 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> Message-ID: On Sat, Oct 17, 2009 at 5:27 PM, wrote: > On Sat, Oct 17, 2009 at 2:02 PM, Charles R Harris > wrote: > > > > > > On Sat, Oct 17, 2009 at 11:54 AM, wrote: > >> > >> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Sat, Oct 17, 2009 at 9:36 AM, per freem > wrote: > >> >> > >> >> hi all, > >> >> > >> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a > >> >> lot. as far as i can tell, this function has no vectorized version > >> >> that works on an m-x-n matrix. i might be doing something wrong here, > >> >> but i found that this function can run extremely slowly if used as > >> >> follows: i have an array of log probability vectors, such that each > >> >> column sums to one. i want to simply iterate over each column and > >> >> renormalize it, using exp(col - logsumexp(col)). here is the code > that > >> >> i used to profile this operation: > >> >> > >> >> from scipy import * > >> >> from numpy import * > >> >> from numpy.random.mtrand import dirichlet > >> >> from scipy.maxentropy import logsumexp > >> >> import time > >> >> > >> > > >> > Why aren't you using logaddexp ufunc from numpy? > >> > >> Maybe because it is difficult to find, it doesn't have its own docs > entry. > >> > >> e.g. no link to logaddexp in > >> > >> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations > >> > >> I have no idea, why it is different from the other ufuncs in the docs > >> (and help file). > >> It shows up correctly in the docs editor, but not in the numpy 1.3 and > >> online docs. > >> > > > > That's curious, none of the five ufuncs added in 1.3 have links even > though > > they all have documentation. > > I found that they are missing from routines.math > http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/ > > I added logaddexp, logaddexp2 and exp2 > > What else? > > Thanks. Also deg2rad, rad2deg, trunc, and copysign need to be added. Is that something that can be done in svn, or automatically, or does it need to be done on docs site? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Oct 17 20:00:19 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 17 Oct 2009 20:00:19 -0400 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> Message-ID: <1cd32cbb0910171700xe319af6o767951191b84b14b@mail.gmail.com> On Sat, Oct 17, 2009 at 7:46 PM, Charles R Harris wrote: > > > On Sat, Oct 17, 2009 at 5:27 PM, wrote: >> >> On Sat, Oct 17, 2009 at 2:02 PM, Charles R Harris >> wrote: >> > >> > >> > On Sat, Oct 17, 2009 at 11:54 AM, wrote: >> >> >> >> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris >> >> wrote: >> >> > >> >> > >> >> > On Sat, Oct 17, 2009 at 9:36 AM, per freem >> >> > wrote: >> >> >> >> >> >> hi all, >> >> >> >> >> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a >> >> >> lot. as far as i can tell, this function has no vectorized version >> >> >> that works on an m-x-n matrix. i might be doing something wrong >> >> >> here, >> >> >> but i found that this function can run extremely slowly if used as >> >> >> follows: i have an array of log probability vectors, such that each >> >> >> column sums to one. i want to simply iterate over each column and >> >> >> renormalize it, using exp(col - logsumexp(col)). here is the code >> >> >> that >> >> >> i used to profile this operation: >> >> >> >> >> >> from scipy import * >> >> >> from numpy import * >> >> >> from numpy.random.mtrand import dirichlet >> >> >> from scipy.maxentropy import logsumexp >> >> >> import time >> >> >> >> >> > >> >> > Why aren't you using logaddexp ufunc from numpy? >> >> >> >> Maybe because it is difficult to find, it doesn't have its own docs >> >> entry. >> >> >> >> e.g. no link to logaddexp in >> >> >> >> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations >> >> >> >> I have no idea, why it is different from the other ufuncs in the docs >> >> (and help file). >> >> It shows up correctly in the docs editor, but not in the numpy 1.3 and >> >> online docs. >> >> >> > >> > That's curious, none of the five ufuncs added in 1.3 have links even >> > though >> > they all have documentation. >> >> I found that they are missing from routines.math >> http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/ >> >> I added logaddexp, logaddexp2 and exp2 >> >> What else? >> > > Thanks. Also deg2rad, rad2deg, trunc, and copysign need to be added. Is that > something that can be done in svn, or automatically, or does it need to be > done on docs site? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > I can do it in the doc editor. I can see from the ufuncs docs where they belong. Josef From josef.pktd at gmail.com Sat Oct 17 20:07:15 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 17 Oct 2009 20:07:15 -0400 Subject: [Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy) In-Reply-To: <1cd32cbb0910171700xe319af6o767951191b84b14b@mail.gmail.com> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <1cd32cbb0910171700xe319af6o767951191b84b14b@mail.gmail.com> Message-ID: <1cd32cbb0910171707v2d526104n93b563da75f353ff@mail.gmail.com> On Sat, Oct 17, 2009 at 8:00 PM, wrote: > On Sat, Oct 17, 2009 at 7:46 PM, Charles R Harris > wrote: >> >> >> On Sat, Oct 17, 2009 at 5:27 PM, wrote: >>> >>> On Sat, Oct 17, 2009 at 2:02 PM, Charles R Harris >>> wrote: >>> > >>> > >>> > On Sat, Oct 17, 2009 at 11:54 AM, wrote: >>> >> >>> >> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris >>> >> wrote: >>> >> > >>> >> > >>> >> > On Sat, Oct 17, 2009 at 9:36 AM, per freem >>> >> > wrote: >>> >> >> >>> >> >> hi all, >>> >> >> >>> >> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a >>> >> >> lot. as far as i can tell, this function has no vectorized version >>> >> >> that works on an m-x-n matrix. i might be doing something wrong >>> >> >> here, >>> >> >> but i found that this function can run extremely slowly if used as >>> >> >> follows: i have an array of log probability vectors, such that each >>> >> >> column sums to one. i want to simply iterate over each column and >>> >> >> renormalize it, using exp(col - logsumexp(col)). here is the code >>> >> >> that >>> >> >> i used to profile this operation: >>> >> >> >>> >> >> from scipy import * >>> >> >> from numpy import * >>> >> >> from numpy.random.mtrand import dirichlet >>> >> >> from scipy.maxentropy import logsumexp >>> >> >> import time >>> >> >> >>> >> > >>> >> > Why aren't you using logaddexp ufunc from numpy? >>> >> >>> >> Maybe because it is difficult to find, it doesn't have its own docs >>> >> entry. >>> >> >>> >> e.g. no link to logaddexp in >>> >> >>> >> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations >>> >> >>> >> I have no idea, why it is different from the other ufuncs in the docs >>> >> (and help file). >>> >> It shows up correctly in the docs editor, but not in the numpy 1.3 and >>> >> online docs. >>> >> >>> > >>> > That's curious, none of the five ufuncs added in 1.3 have links even >>> > though >>> > they all have documentation. >>> >>> I found that they are missing from routines.math >>> http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/ >>> >>> I added logaddexp, logaddexp2 and exp2 >>> >>> What else? >>> >> >> Thanks. Also deg2rad, rad2deg, trunc, and copysign need to be added. Is that >> something that can be done in svn, or automatically, or does it need to be >> done on docs site? >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > I can do it in the doc editor. I can see from the ufuncs docs where they belong. > > Josef > here are the changes, if you wnat to check the location Josef http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/diff/svn/cur/ From charlesr.harris at gmail.com Sun Oct 18 00:22:31 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 17 Oct 2009 22:22:31 -0600 Subject: [Numpy-discussion] Subclassing record array In-Reply-To: References: Message-ID: On Sat, Oct 17, 2009 at 9:13 AM, Lo?c BERTHE wrote: > Hi, > > I would like to create my own class of record array to deal with units. > > Here is the code I used, inspired from > > http://docs.scipy.org/doc/numpy-1.3.x/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array > : > > > [code] > from numpy import * > > class BlocArray(rec.ndarray): > """ Recarray with units and pretty print """ > > fmt_dict = {'S' : '%10s', 'f' : '%10.6G', 'i': '%10d'} > > def __new__(cls, data, titles=None, units=None): > > # guess format for each column > data2 = [] > for line in zip(*data) : > try : data2.append(cast[int](line)) # integers > except ValueError : > try : data2.append(cast[float](line)) # reals > except ValueError : > data2.append(cast[str](line)) # characters > > # create the array > dt = dtype(zip(titres, [line.dtype for line in data2])) > obj = rec.array(data2, dtype=dt).view(cls) > > # add custom attributes > obj.units = units or [] > obj._fmt = " ".join(obj.fmt_dict[d[1][1]] for d in dt.descr) + '\n' > obj._head = "%10s "*len(dt.names) % dt.names +'\n' > obj._head += "%10s "*len(dt.names) % tuple('(%s)' % u for u in > units) +'\n' > > # Finally, we must return the newly created object: > return obj > > titles = ['Name', 'Nb', 'Price'] > units = ['/', '/', 'Eur'] > data = [['fish', '1', '12.25'], ['egg', '6', '0.85'], ['TV', 1, '125']] > bloc = BlocArray(data, titles=titles, units=units) > > In [544]: bloc > Out[544]: > Name Nb Price > (/) (/) (Eur) > fish 1 12.25 > egg 6 0.85 > TV 1 125 > [/code] > > It's almost working, but I have some isues : > > - I can't access data through indexing > In [563]: bloc['Price'] > /home/loic/Python/numpy/test.py in ((r,)) > 50 > 51 def __repr__(self): > ---> 52 return self._head + ''.join(self._fmt % tuple(r) for r in > self) > > TypeError: 'numpy.float64' object is not iterable > > So I think that overloading the __repr__ method is not that easy > > - I can't access data through attributes now : > In [564]: bloc.Nb > AttributeError: 'BlocArray' object has no attribute 'Nb' > > - I can't use 'T' as field in theses array as the T method is > already here as a shortcut for transpose > > > Have you any hints to make this work ? > > > On adding units in general, you might want to contact Darren Dale who has been working in that direction also and has added some infrastructure in svn to make it easier. He also gave a short presentation at scipy2009 on that problem, which has been worked on before. No sense in reinventing the wheel here. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun Oct 18 03:57:32 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 18 Oct 2009 09:57:32 +0200 Subject: [Numpy-discussion] Optimized sum of squares (was: vectorized version of logsumexp? (from scipy.maxentropy)) In-Reply-To: <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> Message-ID: <20091018075732.GA31449@phare.normalesup.org> On Sat, Oct 17, 2009 at 07:27:55PM -0400, josef.pktd at gmail.com wrote: > >> > Why aren't you using logaddexp ufunc from numpy? > >> Maybe because it is difficult to find, it doesn't have its own docs entry. Speaking of which... I thought that there was a readily-written, optimized function (or ufunc) in numpy or scipy that calculated the sum of squares for an array (possibly along an axis). However, I cannot find it. Is there something similar? If not, it is not the end of the world, the operation is trivial to write. Cheers, Ga?l From gruben at bigpond.net.au Sun Oct 18 06:06:15 2009 From: gruben at bigpond.net.au (Gary Ruben) Date: Sun, 18 Oct 2009 21:06:15 +1100 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: <20091018075732.GA31449@phare.normalesup.org> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> Message-ID: <4ADAE897.1070000@bigpond.net.au> Hi Ga?l, If you've got a 1D array/vector called "a", I think the normal idiom is np.dot(a,a) For the more general case, I think np.tensordot(a, a, axes=something_else) should do it, where you should be able to figure out something_else for your particular case. Gary R. Gael Varoquaux wrote: > On Sat, Oct 17, 2009 at 07:27:55PM -0400, josef.pktd at gmail.com wrote: >>>>> Why aren't you using logaddexp ufunc from numpy? > >>>> Maybe because it is difficult to find, it doesn't have its own docs entry. > > Speaking of which... > > I thought that there was a readily-written, optimized function (or ufunc) > in numpy or scipy that calculated the sum of squares for an array > (possibly along an axis). However, I cannot find it. > > Is there something similar? If not, it is not the end of the world, the > operation is trivial to write. > > Cheers, > > Ga?l From dsdale24 at gmail.com Sun Oct 18 07:48:43 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Sun, 18 Oct 2009 07:48:43 -0400 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: On Sat, Oct 17, 2009 at 6:45 PM, Charles R Harris wrote: > > > On Sat, Oct 17, 2009 at 6:49 AM, Darren Dale wrote: [...] >> I think it will be not too difficult to document this overall scheme: >> >> When calling numpy functions: >> >> 1) __input_prepare__ provides an opportunity to operate on the inputs >> to yield versions that are compatible with the operation (they should >> obviously not be modified in place) >> >> 2) the output array is established >> >> 3) __array_prepare__ is used to determine the class of the output >> array, as well as any metadata that needs to be established before the >> operation proceeds >> >> 4) the ufunc performs its operations >> >> 5) __array_wrap__ provides an opportunity to update the output array >> based on the results of the computation >> >> Comments, criticisms? If PEP 3124^ were already a part of the standard >> library, that could serve as the basis for generalizing numpy's >> functions. But I think the PEP will not be approved in its current >> form, and it is unclear when and if the author will revisit the >> proposal. The scheme I'm imagining might be sufficient for our >> purposes. >> > > This sounds interesting to me, as it would push the use of array wrap down > into a common function and make it easier to use. Sorry, I don't understand what you mean. > I wonder what the impact > would be on the current subclasses of ndarray? I don't think it will have any impact. The only change would be the addition of __input_prepare__, which by default would simply return the unmodified inputs. > On a side note, I wonder if you could look into adding your reduce loop > optimizations into the generic loops? It would be interesting to see if that > speeded up some common operations. In any case, it can't hurt. I think you are confusing me with someone else. Darren From mdroe at stsci.edu Sun Oct 18 08:04:15 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Sun, 18 Oct 2009 08:04:15 -0400 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> Message-ID: <4ADB043F.7060608@stsci.edu> On 10/16/2009 11:35 PM, Travis Oliphant wrote: > > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote: > >> I recently committed a regression test and bugfix for object pointers in >> record arrays of unaligned size (meaning where each record is not a >> multiple of sizeof(PyObject **)). >> >> For example: >> >> a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')]) >> a2 = np.zeros((10,), 'S10') >> # This copying would segfault >> a1['o'] = a2 >> >> http://projects.scipy.org/numpy/ticket/1198 >> >> Unfortunately, this unit test has opened up a whole hornet's nest of >> alignment issues on Solaris. The various reference counting functions >> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, >> for instance. Interestingly, there are comments in there saying >> "handles misaligned data" (eg. line 190), but in fact it doesn't, and >> doesn't look to me like it would. But I won't rule out a mistake in >> building it on my part. > > Thanks for this bug report. It would be very helpful if you could > provide the line number where the code is giving a bus error and > explain why you think the code in question does not handle misaligned > data (it still seems like it should to me --- but perhaps I must be > missing something --- I don't have a Solaris box to test on). > Perhaps, the real problem is elsewhere (such as other places where the > mistake of forgetting about striding needing to be aligned also before > pursuing the fast alignment path that you pointed out in another place > of code). > > This was the thinking for why the code (that I think is in question) > should handle mis-aligned data: > > 1) pointers that are not aligned to the correct size need to be copied > to an aligned memory area before being de-referenced. > 2) static variables defined in a function will be aligned by the C > compiler. > > So, what the code in refcnt.c does is to copy the value in the NumPy > data-area (i.e. pointed to by it->dataptr) to another memory location > (the stack variable temp), dereference it and then increment it's > reference count. > > 196: temp = (PyObject **)it->dataptr; > 197: Py_XINCREF(*temp); This is exactly an instance that fails. Let's say we have a PyObject at an aligned location 0x4000 (PyObjects themselves always seem to be aligned -- I strongly suspect CPython is enforcing that). Then, we can create a recarray such that some of the PyObject*'s in it are at unaligned locations. For example, if the dtype is 'O,c', you have a record stride of 5 which creates unaligned PyObject*'s: OOOOcOOOOcOOOOc 0123456789abcde ^ ^ Now in the code above, let's assume that it->dataptr points to an unaligned location, 0x8005. Assigning it to temp puts the same unaligned value in temp, 0x8005. That is: &temp == 0x1000 /* The location of temp *is* on the stack and aligned */ temp == 0x8005 /* But its value as a pointer points to an unaligned memory location */ *temp == 0x4000 /* Dereferencing it should get us back to the original PyObject * pointer, but dereferencing an unaligned memory location fails with a bus error on Solaris */ So the bus error occurs on line 197. Note that something like: PyObject* temp; temp = *(PyObject **)it->dataptr; would also fail. The solution (this is what works for me, though there may be a better way): PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject **) */ /* memcpy works byte-by-byte, so can handle an unaligned assignment */ memcpy(&temp, it->dataptr, sizeof(PyObject *)); Py_XINCREF(temp); I'm proposing adding a macro which on Intel/AMD would be defined as: #define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src)) and on alignment-required platforms as: #define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src), sizeof(PyObject *)) and it would be used something like: COPY_PYOBJECT_PTR(&temp, it->dataptr); If you agree with this assessment, I'm working on a patch for all of the locations that require this change. All that I've found so far are related to object arrays. It seems that many places where this would be an issue for numeric types are already using this memcpy technique (e.g. *_copyswap in arraytype.c.src:1716). I think this issue shows up in object arrays much more because there are many more places where the unaligned memory is dereferenced (in order to do reference counting). So here's the traceback from: a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c'), ('i', 'i'), ('c2', 'c')]) Unfortunately, I'm having trouble getting line numbers out of the debugger, but "print statement debugging" tells me the inner most frame here is in refcount.c: 275 PyObject **temp; 276 Py_XINCREF(obj); 277 temp = (PyObject **)optr; 278 *temp = obj; /* <-- here */ 279 return; My fix was: Py_XINCREF(obj); memcpy(optr, &obj, sizeof(PyObject*)); return; 0xfeefaf60 in _fillobject () from /home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so (gdb) bt #0 0xfeefaf60 in _fillobject () from /home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so #1 0xfeefaf20 in _fillobject () from /home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so #2 0xfeefad40 in PyArray_FillObjectArray () from /home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so #3 0xfee90e04 in _zerofill () from /home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so #4 0xfeed48c4 in PyArray_Zeros () from /home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so #5 0xfef05638 in array_zeros () from /home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so #6 0x37e8c in PyObject_Call () #7 0x9a7e8 in do_call () #8 0x9a264 in call_function () #9 0x9754c in PyEval_EvalFrameEx () #10 0x988d4 in PyEval_EvalCodeEx () #11 0x93d44 in PyEval_EvalCode () #12 0xb9150 in run_mod () #13 0xb9108 in PyRun_FileExFlags () #14 0xb80c4 in PyRun_SimpleFileExFlags () #15 0x3171c in Py_Main () Hope that illustrates the point better. Sorry for my vagueness in my initial report. Mike From dsdale24 at gmail.com Sun Oct 18 08:06:52 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Sun, 18 Oct 2009 08:06:52 -0400 Subject: [Numpy-discussion] Subclassing record array In-Reply-To: References: Message-ID: On Sun, Oct 18, 2009 at 12:22 AM, Charles R Harris wrote: > > > On Sat, Oct 17, 2009 at 9:13 AM, Lo?c BERTHE wrote: >> >> ? Hi, >> >> I would like to create my own class of record array to deal with units. >> >> Here is the code I used, inspired from >> >> http://docs.scipy.org/doc/numpy-1.3.x/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array >> : >> >> >> [code] >> from numpy import * >> >> class BlocArray(rec.ndarray): >> ? ?""" Recarray with units and pretty print """ >> >> ? ?fmt_dict = {'S' : '%10s', 'f' : '%10.6G', 'i': '%10d'} >> >> ? ?def __new__(cls, data, titles=None, units=None): >> >> ? ? ? ?# guess format for each column >> ? ? ? ?data2 = [] >> ? ? ? ?for line in zip(*data) : >> ? ? ? ? ? ?try : data2.append(cast[int](line)) ? ? ? ? # integers >> ? ? ? ? ? ?except ValueError : >> ? ? ? ? ? ? ? ?try : data2.append(cast[float](line)) ? # reals >> ? ? ? ? ? ? ? ?except ValueError : >> ? ? ? ? ? ? ? ? ? ?data2.append(cast[str](line)) ? ? ? # characters >> >> ? ? ? ?# create the array >> ? ? ? ?dt = dtype(zip(titres, [line.dtype for line in data2])) >> ? ? ? ?obj = rec.array(data2, dtype=dt).view(cls) >> >> ? ? ? ?# add custom attributes >> ? ? ? ?obj.units = units or [] >> ? ? ? ?obj._fmt = " ".join(obj.fmt_dict[d[1][1]] for d in dt.descr) + '\n' >> ? ? ? ?obj._head = "%10s "*len(dt.names) % dt.names +'\n' >> ? ? ? ?obj._head += "%10s "*len(dt.names) % tuple('(%s)' % u for u in >> units) +'\n' >> >> ? ? ? ?# Finally, we must return the newly created object: >> ? ? ? ?return obj >> >> titles = ?['Name', 'Nb', 'Price'] >> units = ['/', '/', 'Eur'] >> data = [['fish', '1', '12.25'], ['egg', '6', '0.85'], ['TV', 1, '125']] >> bloc = BlocArray(data, titles=titles, units=units) >> >> In [544]: bloc >> Out[544]: >> ? ? ?Name ? ? ? ? Nb ? ? ?Price >> ? ? ? (/) ? ? ? ?(/) ? ? ?(Eur) >> ? ? ?fish ? ? ? ? ?1 ? ? ?12.25 >> ? ? ? egg ? ? ? ? ?6 ? ? ? 0.85 >> ? ? ? ?TV ? ? ? ? ?1 ? ? ? ?125 >> [/code] >> >> It's almost working, but I have some isues : >> >> ? - I can't access data through indexing >> In [563]: bloc['Price'] >> /home/loic/Python/numpy/test.py in ((r,)) >> ? ? 50 >> ? ? 51 ? ? def __repr__(self): >> ---> 52 ? ? ? ? return self._head + ''.join(self._fmt % tuple(r) for r in >> self) >> >> TypeError: 'numpy.float64' object is not iterable >> >> So I think that overloading the __repr__ method is not that easy >> >> ? - I can't access data through attributes now : >> In [564]: bloc.Nb >> AttributeError: 'BlocArray' object has no attribute 'Nb' >> >> ? - I can't use 'T' as field in theses array as the T method is >> already here as a shortcut for transpose >> >> >> Have you any hints to make this work ? >> >> > > On adding units in general, you might want to contact Darren Dale who has > been working in that direction also and has added some infrastructure in svn > to make it easier. He also gave a short presentation at scipy2009 on that > problem, which has been worked on before. No sense in reinventing the wheel > here. The units package I have been working on is called quantities. It is available at the python package index, and the project is hosted at launchpad as python-quantities. If quantities isn't a good fit, please let me know why. At least the code can provide some example of how to subclass ndarray. Darren From gael.varoquaux at normalesup.org Sun Oct 18 08:09:27 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 18 Oct 2009 14:09:27 +0200 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: <4ADAE897.1070000@bigpond.net.au> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> Message-ID: <20091018120927.GA1113@phare.normalesup.org> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote: > Hi Ga?l, > If you've got a 1D array/vector called "a", I think the normal idiom is > np.dot(a,a) > For the more general case, I think > np.tensordot(a, a, axes=something_else) > should do it, where you should be able to figure out something_else for > your particular case. Ha, yes. Good point about the tensordot trick. Thank you Ga?l From charlesr.harris at gmail.com Sun Oct 18 10:27:38 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 18 Oct 2009 08:27:38 -0600 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <4ADB043F.7060608@stsci.edu> References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> Message-ID: On Sun, Oct 18, 2009 at 6:04 AM, Michael Droettboom wrote: > On 10/16/2009 11:35 PM, Travis Oliphant wrote: > > > > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote: > > > >> I recently committed a regression test and bugfix for object pointers in > >> record arrays of unaligned size (meaning where each record is not a > >> multiple of sizeof(PyObject **)). > >> > >> For example: > >> > >> a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')]) > >> a2 = np.zeros((10,), 'S10') > >> # This copying would segfault > >> a1['o'] = a2 > >> > >> http://projects.scipy.org/numpy/ticket/1198 > >> > >> Unfortunately, this unit test has opened up a whole hornet's nest of > >> alignment issues on Solaris. The various reference counting functions > >> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, > >> for instance. Interestingly, there are comments in there saying > >> "handles misaligned data" (eg. line 190), but in fact it doesn't, and > >> doesn't look to me like it would. But I won't rule out a mistake in > >> building it on my part. > > > > Thanks for this bug report. It would be very helpful if you could > > provide the line number where the code is giving a bus error and > > explain why you think the code in question does not handle misaligned > > data (it still seems like it should to me --- but perhaps I must be > > missing something --- I don't have a Solaris box to test on). > > Perhaps, the real problem is elsewhere (such as other places where the > > mistake of forgetting about striding needing to be aligned also before > > pursuing the fast alignment path that you pointed out in another place > > of code). > > > > This was the thinking for why the code (that I think is in question) > > should handle mis-aligned data: > > > > 1) pointers that are not aligned to the correct size need to be copied > > to an aligned memory area before being de-referenced. > > 2) static variables defined in a function will be aligned by the C > > compiler. > > > > So, what the code in refcnt.c does is to copy the value in the NumPy > > data-area (i.e. pointed to by it->dataptr) to another memory location > > (the stack variable temp), dereference it and then increment it's > > reference count. > > > > 196: temp = (PyObject **)it->dataptr; > > 197: Py_XINCREF(*temp); > This is exactly an instance that fails. Let's say we have a PyObject at > an aligned location 0x4000 (PyObjects themselves always seem to be > aligned -- I strongly suspect CPython is enforcing that). Then, we can > create a recarray such that some of the PyObject*'s in it are at > unaligned locations. For example, if the dtype is 'O,c', you have a > record stride of 5 which creates unaligned PyObject*'s: > > OOOOcOOOOcOOOOc > 0123456789abcde > ^ ^ > > Now in the code above, let's assume that it->dataptr points to an > unaligned location, 0x8005. Assigning it to temp puts the same > unaligned value in temp, 0x8005. That is: > > &temp == 0x1000 /* The location of temp *is* on the stack and aligned */ > temp == 0x8005 /* But its value as a pointer points to an unaligned > memory location */ > *temp == 0x4000 /* Dereferencing it should get us back to the original > PyObject * pointer, but dereferencing an > unaligned memory location > fails with a bus error on Solaris */ > > So the bus error occurs on line 197. > > Note that something like: > > PyObject* temp; > temp = *(PyObject **)it->dataptr; > > would also fail. > > The solution (this is what works for me, though there may be a better way): > > PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject > **) */ > /* memcpy works byte-by-byte, so can handle an unaligned assignment */ > memcpy(&temp, it->dataptr, sizeof(PyObject *)); > Py_XINCREF(temp); > > I'm proposing adding a macro which on Intel/AMD would be defined as: > > #define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src)) > > and on alignment-required platforms as: > > #define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src), > sizeof(PyObject *)) > > and it would be used something like: > > COPY_PYOBJECT_PTR(&temp, it->dataptr); > > This looks right to me, but I'll let Travis sign off on it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Sun Oct 18 12:06:49 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Sun, 18 Oct 2009 12:06:49 -0400 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: <20091018120927.GA1113@phare.normalesup.org> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> <20091018120927.GA1113@phare.normalesup.org> Message-ID: On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux wrote: > On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote: >> Hi Ga?l, > >> If you've got a 1D array/vector called "a", I think the normal idiom is > >> np.dot(a,a) > >> For the more general case, I think >> np.tensordot(a, a, axes=something_else) >> should do it, where you should be able to figure out something_else for >> your particular case. > > Ha, yes. Good point about the tensordot trick. > > Thank you > > Ga?l I'm curious about this as I use ss, which is just np.sum(a*a, axis), in statsmodels and didn't much think about it. There is import numpy as np from scipy.stats import ss a = np.ones(5000) but timeit ss(a) 10000 loops, best of 3: 21.5 ?s per loop timeit np.add.reduce(a*a) 100000 loops, best of 3: 15 ?s per loop timeit np.dot(a,a) 100000 loops, best of 3: 5.38 ?s per loop Do the number of loops matter in the timings and is dot always faster even without the blas dot? Skipper From gokhansever at gmail.com Sun Oct 18 13:03:14 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Sun, 18 Oct 2009 12:03:14 -0500 Subject: [Numpy-discussion] Multiple string formatting while writing an array into a file Message-ID: <49d6b3500910181003w56dae6a5h76269d110e71f22e@mail.gmail.com> Hello, I have a relatively simple question which I couldn't figure out myself yet. I have an array that I am writing into a file using the following savetxt method. np.savetxt(fid, output_array, fmt='%12.4f', delimiter='') However, I have made some changes on the code and I require to write after 7th element of the array as integer instead of 12.4 formatted float. The change below doesn't help me to solve the problem since I get a "ValueError: setting an array element with a sequence." np.savetxt(fid, (output_array[:7], output_array[7:]), fmt=('%12.4f', '%12d'), delimiter='') What would be the right approach to fix this issue? Thanks. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Oct 18 13:37:55 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 18 Oct 2009 13:37:55 -0400 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> <20091018120927.GA1113@phare.normalesup.org> Message-ID: <1cd32cbb0910181037h444b3491i5b6092e12a75d75d@mail.gmail.com> On Sun, Oct 18, 2009 at 12:06 PM, Skipper Seabold wrote: > On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux > wrote: >> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote: >>> Hi Ga?l, >> >>> If you've got a 1D array/vector called "a", I think the normal idiom is >> >>> np.dot(a,a) >> >>> For the more general case, I think >>> np.tensordot(a, a, axes=something_else) >>> should do it, where you should be able to figure out something_else for >>> your particular case. >> >> Ha, yes. Good point about the tensordot trick. >> >> Thank you >> >> Ga?l > > I'm curious about this as I use ss, which is just np.sum(a*a, axis), > in statsmodels and didn't much think about it. > > There is > > import numpy as np > from scipy.stats import ss > > a = np.ones(5000) > > but > > timeit ss(a) > 10000 loops, best of 3: 21.5 ?s per loop > > timeit np.add.reduce(a*a) > 100000 loops, best of 3: 15 ?s per loop > > timeit np.dot(a,a) > 100000 loops, best of 3: 5.38 ?s per loop > > Do the number of loops matter in the timings and is dot always faster > even without the blas dot? David's reply once was that it depends on ATLAS and the version of lapack/blas. I usually switched to using dot for 1d. Using tensordot looks to complicated for me, to figure out the axes when I quickly want a sum of squares. I never tried the timing of tensordot for 2d arrays, especially for axis=0 for a c ordered array. If it's faster, this could be useful to rewrite stats.ss. I don't remember that np.add.reduce is much faster than np.sum. This might be the additional call overhead from using another function in between. Josef > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla at molden.no Sun Oct 18 14:16:34 2009 From: sturla at molden.no (Sturla Molden) Date: Sun, 18 Oct 2009 20:16:34 +0200 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> <20091018120927.GA1113@phare.normalesup.org> Message-ID: <4ADB5B82.2040105@molden.no> Skipper Seabold skrev: > I'm curious about this as I use ss, which is just np.sum(a*a, axis), > in statsmodels and didn't much think about it. > > Do the number of loops matter in the timings and is dot always faster > even without the blas dot? > The thing is that a*a returns a temporary array with the same shape as a, and then that is passed to np.sum. The BLAS dot product don't need to allocate and deallocate temporary arrays. S.M. From charlesr.harris at gmail.com Sun Oct 18 14:19:32 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 18 Oct 2009 12:19:32 -0600 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: <1cd32cbb0910181037h444b3491i5b6092e12a75d75d@mail.gmail.com> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> <20091018120927.GA1113@phare.normalesup.org> <1cd32cbb0910181037h444b3491i5b6092e12a75d75d@mail.gmail.com> Message-ID: On Sun, Oct 18, 2009 at 11:37 AM, wrote: > On Sun, Oct 18, 2009 at 12:06 PM, Skipper Seabold > wrote: > > On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux > > wrote: > >> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote: > >>> Hi Ga?l, > >> > >>> If you've got a 1D array/vector called "a", I think the normal idiom is > >> > >>> np.dot(a,a) > >> > >>> For the more general case, I think > >>> np.tensordot(a, a, axes=something_else) > >>> should do it, where you should be able to figure out something_else for > >>> your particular case. > >> > >> Ha, yes. Good point about the tensordot trick. > >> > >> Thank you > >> > >> Ga?l > > > > I'm curious about this as I use ss, which is just np.sum(a*a, axis), > > in statsmodels and didn't much think about it. > > > > There is > > > > import numpy as np > > from scipy.stats import ss > > > > a = np.ones(5000) > > > > but > > > > timeit ss(a) > > 10000 loops, best of 3: 21.5 ?s per loop > > > > timeit np.add.reduce(a*a) > > 100000 loops, best of 3: 15 ?s per loop > > > > timeit np.dot(a,a) > > 100000 loops, best of 3: 5.38 ?s per loop > > > > Do the number of loops matter in the timings and is dot always faster > > even without the blas dot? > > David's reply once was that it depends on ATLAS and the version of > lapack/blas. > > I usually switched to using dot for 1d. Using tensordot looks to > complicated for me, to figure out the axes when I quickly want a sum of > squares. > > I never tried the timing of tensordot for 2d arrays, especially for > axis=0 for a > c ordered array. If it's faster, this could be useful to rewrite stats.ss. > > I don't remember that np.add.reduce is much faster than np.sum. This might > be > the additional call overhead from using another function in between. > > If you are using numpy from svn, it might be due to te recent optimizations that Luca Citi did for some of the ufuncs. Now we just need a multiply and add function. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffamcgee at gmail.com Sun Oct 18 15:11:49 2009 From: jeffamcgee at gmail.com (Jeffrey McGee) Date: Sun, 18 Oct 2009 14:11:49 -0500 Subject: [Numpy-discussion] TypeError when calling numpy.kaiser() Message-ID: Howdy, I'm having trouble getting the kaiser window to work. Anytime I try to call numpy.kaiser(), it throws an exception. Here's the output when I run the example code from http://docs.scipy.org/doc/numpy/reference/generated/numpy.kaiser.html : Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from numpy import kaiser >>> kaiser(12, 14) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py", line 2630, in kaiser return i0(beta * sqrt(1-((n-alpha)/alpha)**2.0))/i0(beta) File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py", line 2507, in i0 y[ind] = _i0_1(x[ind]) TypeError: array cannot be safely cast to required type >>> Is this a bug? Am I doing something wrong? (I'm using the Ubuntu 9.4 packages for python and numpy.) Thanks, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.walter at gmail.com Mon Oct 19 03:10:06 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Mon, 19 Oct 2009 09:10:06 +0200 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale wrote: > numpy's functions, especially ufuncs, have had some ability to support > subclasses through the ndarray.__array_wrap__ method, which provides > masked arrays or quantities (for example) with an opportunity to set > the class and metadata of the output array at the end of an operation. > An example is > > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'meters') > numpy.add(q1, q2) # yields Quantity(3, 'meters') > > At SciPy2009 we committed a change to the numpy trunk that provides a > chance to determine the class and some metadata of the output *before* > the ufunc performs its calculation, but after output array has been > established (and its data is still uninitialized). Consider: > > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'J') > numpy.add(q1, q2, q1) > # or equivalently: > # q1 += q2 > > With only __array_wrap__, the attempt to propagate the units happens > after q1's data was updated in place, too late to raise an error, the > data is now corrupted. __array_prepare__ solves that problem, an > exception can be raised in time. > > Now I'd like to suggest one more improvement to numpy to make its > functions more generic. Consider one more example: > > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'feet') > numpy.add(q1, q2) > > In this case, I'd like an opportunity to operate on the input arrays > on the way in to the ufunc, to rescale the second input to meters. I > think it would be a hack to try to stuff this capability into > __array_prepare__. One form of this particular example is already > supported in quantities, "q1 + q2", by overriding the __add__ method > to rescale the second input, but there are ufuncs that do not have an > associated special method. So I'd like to look into adding another > check for a special method, perhaps called __input_prepare__. My time > is really tight for the next month, so I'd rather not start if there > are strong objections, but otherwise, I'd like to try to try to get it > in in time for numpy-1.4. (Has a timeline been established?) > > I think it will be not too difficult to document this overall scheme: > > When calling numpy functions: > > 1) __input_prepare__ provides an opportunity to operate on the inputs > to yield versions that are compatible with the operation (they should > obviously not be modified in place) > > 2) the output array is established > > 3) __array_prepare__ is used to determine the class of the output > array, as well as any metadata that needs to be established before the > operation proceeds > > 4) the ufunc performs its operations > > 5) __array_wrap__ provides an opportunity to update the output array > based on the results of the computation > > Comments, criticisms? If PEP 3124^ were already a part of the standard > library, that could serve as the basis for generalizing numpy's > functions. But I think the PEP will not be approved in its current > form, and it is unclear when and if the author will revisit the > proposal. The scheme I'm imagining might be sufficient for our > purposes. I'm all for generic (u)funcs since they might come handy for me since I'm doing lots of operation on arrays of polynomials. I don't quite get the reasoning though. Could you correct me where I get it wrong? * the class Quantity derives from numpy.ndarray * Quantity overrides __add__, __mul__ etc. and you get the correct behaviour for q1 = Quantity(1, 'meter') q2 = Quantity(2, 'J') by raising an exception when performing q1+=q2 * The problem is that numpy.add(q1,q1,q2) would corrupt q1 before raising an exception Sebastian > > Darren > > ^ http://www.python.org/dev/peps/pep-3124/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robince at gmail.com Mon Oct 19 07:22:50 2009 From: robince at gmail.com (Robin) Date: Mon, 19 Oct 2009 12:22:50 +0100 Subject: [Numpy-discussion] fortran vs numpy on mac/linux - gcc performance? Message-ID: <2d5132a50910190422g74f246c6t3b9d18cc1742a50a@mail.gmail.com> Hi, I have been looking at moving some of my bottleneck functions to fortran with f2py. To get started I tried some simple things, and was surprised they performend so much better than the number builtins - which I assumed would be c and would be quite fast. On my Macbook pro laptop (Intel core 2 duo) I got the following results. Numpy is built with xcode gcc 4.0.1 and gfortran is 4.2.3 - fortran code for shuffle and bincount below: In [1]: x = np.random.random_integers(0,1023,1000000).astype(int) In [2]: import ftest In [3]: timeit np.bincount(x) 100 loops, best of 3: 3.97 ms per loop In [4]: timeit ftest.bincount(x,1024) 1000 loops, best of 3: 1.15 ms per loop In [5]: timeit np.random.shuffle(x) 1 loops, best of 3: 605 ms per loop In [6]: timeit ftest.shuffle(x) 10 loops, best of 3: 139 ms per loop So fortran was about 4 times faster for these loops - similarly faster than cython as well. So I was really happy as these are two of my biggest bottlenecks, but when I moved a linux workstation I got different results. Here with gcc/gfortran 4.3.3 : In [3]: x = np.random.random_integers(0,1023,1000000).astype(int) In [4]: timeit np.bincount(x) 100 loops, best of 3: 8.18 ms per loop In [5]: timeit ftest.bincount(x,1024) 100 loops, best of 3: 8.25 ms per loop In [6]: In [7]: timeit np.random.shuffle(x) 1 loops, best of 3: 379 ms per loop In [8]: timeit ftest.shuffle(x) 10 loops, best of 3: 172 ms per loop So shuffle is a bit faster, but bincount is now the same as fortran. The only thing I can think is that it is due to much better performance of the more recent c compiler. I think this would also explain why f2py extension was performing so much better than cython on the mac. So my question is - is there a way to build numpy with a more recent compiler on leopard? (I guess I could upgrade to snow leopard now) - Could I make the numpy install use gcc-4.2 from xcode or would it break stuff? Could I use gcc 4.3.3 from macports? It would be great to get a 4x speed up on all numpy c loops! (already just these two functions I use a lot would make a big difference). Cheers Robin From robince at gmail.com Mon Oct 19 07:24:18 2009 From: robince at gmail.com (Robin) Date: Mon, 19 Oct 2009 12:24:18 +0100 Subject: [Numpy-discussion] fortran vs numpy on mac/linux - gcc performance? In-Reply-To: <2d5132a50910190422g74f246c6t3b9d18cc1742a50a@mail.gmail.com> References: <2d5132a50910190422g74f246c6t3b9d18cc1742a50a@mail.gmail.com> Message-ID: <2d5132a50910190424m9589ba2w700f75acea34bc4a@mail.gmail.com> Forgot to include the fortran code used: jm-g26b101:fortran robince$ cat test.f95 subroutine bincount (x,c,n,m) implicit none integer, intent(in) :: n,m integer, dimension(0:n-1), intent(in) :: x integer, dimension(0:m-1), intent(out) :: c integer :: i c = 0 do i = 0, n-1 c(x(i)) = c(x(i)) + 1 end do end subroutine shuffle (x,s,n) implicit none integer, intent(in) :: n integer, dimension(n), intent(in) :: x integer, dimension(n), intent(out) :: s integer :: i,randpos,temp real :: r ! copy input s = x call init_random_seed() ! knuth shuffle from http://rosettacode.org/wiki/Knuth_shuffle#Fortran do i = n, 2, -1 call random_number(r) randpos = int(r * i) + 1 temp = s(randpos) s(randpos) = s(i) s(i) = temp end do end subroutine init_random_seed() ! init_random_seed from gfortran documentation integer :: i, n, clock integer, dimension(:), allocatable :: seed call random_seed(size = n) allocate(seed(n)) call system_clock(count=clock) seed = clock + 37 * (/ (i - 1, i = 1, n) /) call random_seed(put = seed) deallocate(seed) end subroutine From dsdale24 at gmail.com Mon Oct 19 07:55:36 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Mon, 19 Oct 2009 07:55:36 -0400 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: On Mon, Oct 19, 2009 at 3:10 AM, Sebastian Walter wrote: > On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale wrote: >> numpy's functions, especially ufuncs, have had some ability to support >> subclasses through the ndarray.__array_wrap__ method, which provides >> masked arrays or quantities (for example) with an opportunity to set >> the class and metadata of the output array at the end of an operation. >> An example is >> >> q1 = Quantity(1, 'meter') >> q2 = Quantity(2, 'meters') >> numpy.add(q1, q2) # yields Quantity(3, 'meters') >> >> At SciPy2009 we committed a change to the numpy trunk that provides a >> chance to determine the class and some metadata of the output *before* >> the ufunc performs its calculation, but after output array has been >> established (and its data is still uninitialized). Consider: >> >> q1 = Quantity(1, 'meter') >> q2 = Quantity(2, 'J') >> numpy.add(q1, q2, q1) >> # or equivalently: >> # q1 += q2 >> >> With only __array_wrap__, the attempt to propagate the units happens >> after q1's data was updated in place, too late to raise an error, the >> data is now corrupted. __array_prepare__ solves that problem, an >> exception can be raised in time. >> >> Now I'd like to suggest one more improvement to numpy to make its >> functions more generic. Consider one more example: >> >> q1 = Quantity(1, 'meter') >> q2 = Quantity(2, 'feet') >> numpy.add(q1, q2) >> >> In this case, I'd like an opportunity to operate on the input arrays >> on the way in to the ufunc, to rescale the second input to meters. I >> think it would be a hack to try to stuff this capability into >> __array_prepare__. One form of this particular example is already >> supported in quantities, "q1 + q2", by overriding the __add__ method >> to rescale the second input, but there are ufuncs that do not have an >> associated special method. So I'd like to look into adding another >> check for a special method, perhaps called __input_prepare__. My time >> is really tight for the next month, so I'd rather not start if there >> are strong objections, but otherwise, I'd like to try to try to get it >> in in time for numpy-1.4. (Has a timeline been established?) >> >> I think it will be not too difficult to document this overall scheme: >> >> When calling numpy functions: >> >> 1) __input_prepare__ provides an opportunity to operate on the inputs >> to yield versions that are compatible with the operation (they should >> obviously not be modified in place) >> >> 2) the output array is established >> >> 3) __array_prepare__ is used to determine the class of the output >> array, as well as any metadata that needs to be established before the >> operation proceeds >> >> 4) the ufunc performs its operations >> >> 5) __array_wrap__ provides an opportunity to update the output array >> based on the results of the computation >> >> Comments, criticisms? If PEP 3124^ were already a part of the standard >> library, that could serve as the basis for generalizing numpy's >> functions. But I think the PEP will not be approved in its current >> form, and it is unclear when and if the author will revisit the >> proposal. The scheme I'm imagining might be sufficient for our >> purposes. > > I'm all for generic (u)funcs since they might come handy for me since > I'm doing lots of operation on arrays of polynomials. > ?I don't quite get the reasoning though. > Could you correct me where I get it wrong? > * the class Quantity derives from numpy.ndarray > * Quantity overrides __add__, __mul__ etc. and you get the correct behaviour for > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'J') > by raising an exception when performing q1+=q2 No, Quantity does not override __iadd__ to catch this. Quantity implements __array_prepare__ to perform the dimensional analysis based on the identity of the ufunc and the inputs, and set the class and dimensionality of the output array, or raise an error when dimensional analysis fails. This approach lets quantities support all ufuncs (in principle), not just built in numerical operations. It should also make it easier to subclass from MaskedArray, so we could have a MaskedQuantity without having to establish yet another suite of ufuncs specific to quantities or masked quantities. > * The problem is that numpy.add(q1,q1,q2) would corrupt q1 before > raising an exception That was solved by the addition of __array_prepare__ to numpy back in August. What I am proposing now is supporting operations on arrays that would be compatible if we had a chance to transform them on the way into the ufunc, like "meter + foot". Darren From josef.pktd at gmail.com Mon Oct 19 10:40:23 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 19 Oct 2009 10:40:23 -0400 Subject: [Numpy-discussion] numpy build/installation problems ? Message-ID: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com> I wanted to finally upgrade my numpy, so I can build scipy trunk again, but I get test failures with numpy. And running the tests of the previously compiled version of scipy crashes in signaltools. Is this a problem with my build (the usual official MingW on WindowsXP), or are there still ABI problems in numpy trunk? I did the build twice with (I think) clean directories and get the same result. Thanks, Josef Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.__file__ 'C:\\Josef\\_progs\\Subversion\\numpy-trunk\\dist\\numpy-1.4.0.dev7539.win32\\Pr ograms\\Python25\\Lib\\site-packages\\numpy\\__init__.py' >>> numpy.test() Running unit tests for numpy NumPy version 1.4.0.dev7539 NumPy is installed in C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.de v7539.win32\Programs\Python25\Lib\site-packages\numpy Python version 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Int el)] nose version 0.11.1 ................................. \u03a3 .[[' abc' ''] ['12345' 'MixedCase'] ['123 345' 'UPPER']] ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ............................K................................................... ..FF.......................FFFF................................................. ................................................................................ ................................................................................ ................................................................................ ................................................................................ ............................................................................C:\J osef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Programs\Pytho n25\Lib\site-packages\numpy\lib\io.py:1332: ConversionWarning: Some errors were detected ! Line #2 (got 4 columns instead of 5) Line #12 (got 4 columns instead of 5) Line #22 (got 4 columns instead of 5) Line #32 (got 4 columns instead of 5) Line #42 (got 4 columns instead of 5) warnings.warn(errmsg, ConversionWarning) .C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Programs\ Python25\Lib\site-packages\numpy\lib\io.py:1332: ConversionWarning: Some errors were detected ! Line #2 (got 4 columns instead of 2) Line #12 (got 4 columns instead of 2) Line #22 (got 4 columns instead of 2) Line #32 (got 4 columns instead of 2) Line #42 (got 4 columns instead of 2) warnings.warn(errmsg, ConversionWarning) ................E.E.EE....................K........K............F............... ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ..S............................................................................. ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ............................. ====================================================================== ERROR: Test giving usecols with a comma-separated string ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\lib\tests\test_io.py", line 747, in test _usecols_as_css names="a, b, c", usecols="a, c") File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1099, in gen fromtxt the docstring of the `genfromtxt` function. AttributeError: 'tuple' object has no attribute 'index' ====================================================================== ERROR: Test usecols with named columns ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\lib\tests\test_io.py", line 773, in test _usecols_with_named_columns usecols=('a', 'c'), **kwargs) File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1099, in gen fromtxt the docstring of the `genfromtxt` function. AttributeError: 'tuple' object has no attribute 'index' ====================================================================== ERROR: Test with missing and filling values ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\lib\tests\test_io.py", line 861, in test _user_filling_values test = np.genfromtxt(StringIO.StringIO(data), **kwargs) File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1127, in gen fromtxt See Also AttributeError: 'tuple' object has no attribute 'index' ====================================================================== ERROR: test_user_missing_values (test_io.TestFromTxt) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\lib\tests\test_io.py", line 845, in test _user_missing_values **basekwargs) File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1481, in maf romtxt File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1127, in gen fromtxt See Also AttributeError: 'tuple' object has no attribute 'index' ====================================================================== FAIL: test_umath.test_hypot_special_values(1.#QNAN, 1.#INF) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p y", line 183, in runTest self.test(*self.arg) File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\core\tests\test_umath.py", line 211, in assert_hypot_isinf assert np.isinf(ncu.hypot(x, y)) AssertionError ====================================================================== FAIL: test_umath.test_hypot_special_values(1.#INF, 1.#QNAN) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p y", line 183, in runTest self.test(*self.arg) File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\core\tests\test_umath.py", line 211, in assert_hypot_isinf assert np.isinf(ncu.hypot(x, y)) AssertionError ====================================================================== FAIL: test_umath.test_arctan2_special_values(nan, 2.3561944901923448) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p y", line 183, in runTest self.test(*self.arg) File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 449, in assert_a lmost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: nan DESIRED: 2.3561944901923448 ====================================================================== FAIL: test_umath.test_arctan2_special_values(nan, -2.3561944901923448) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p y", line 183, in runTest self.test(*self.arg) File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 449, in assert_a lmost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: nan DESIRED: -2.3561944901923448 ====================================================================== FAIL: test_umath.test_arctan2_special_values(nan, 0.78539816339744828) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p y", line 183, in runTest self.test(*self.arg) File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 449, in assert_a lmost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: nan DESIRED: 0.78539816339744828 ====================================================================== FAIL: test_umath.test_arctan2_special_values(nan, -0.78539816339744828) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p y", line 183, in runTest self.test(*self.arg) File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 449, in assert_a lmost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: nan DESIRED: -0.78539816339744828 ====================================================================== FAIL: test_doctests (test_polynomial.TestDocs) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\lib\tests\test_polynomial.py", line 90, in test_doctests return rundocs() File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 951, in rundocs raise AssertionError("Some doctests failed:\n%s" % "\n".join(msg)) AssertionError: Some doctests failed: ********************************************************************** File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Prog rams\Python25\Lib\site-packages\numpy\lib\tests\test_polynomial.py", line 20, in test_polynomial Failed example: print poly1d([100e-90, 1.234567e-9j+3, -1234.999e8]) Expected: 2 1e-88 x + (3 + 1.235e-09j) x - 1.235e+11 Got: 2 1e-088 x + (3 + 1.235e-009j) x - 1.235e+011 ---------------------------------------------------------------------- Ran 2140 tests in 13.281s FAILED (KNOWNFAIL=3, SKIP=1, errors=4, failures=7) >>> import scipy >>> scipy.__file__ 'c:\\josef\\eclipsegworkspace\\scipy-trunk-work\\scipytrunkcopy\\scipy\\__init__ .pyc' >>> scipy.test() Running unit tests for scipy NumPy version 1.4.0.dev7539 NumPy is installed in C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.de v7539.win32\Programs\Python25\Lib\site-packages\numpy SciPy version 0.8.0.dev SciPy is installed in c:\josef\eclipsegworkspace\scipy-trunk-work\scipytrunkcopy \scipy Python version 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Int el)] nose version 0.11.1 E..... ... crash with verbose the last tests are test_rank1 (test_signaltools.TestCorrelateComplexSingle) ... ok test_rank3 (test_signaltools.TestCorrelateComplexSingle) ... ok test_rank1 (test_signaltools.TestCorrelateDouble) ... ok test_rank3 (test_signaltools.TestCorrelateDouble) ... ok test_rank1 (test_signaltools.TestCorrelateExtended) ... ok test_rank3 (test_signaltools.TestCorrelateExtended) ... ok test_rank1 (test_signaltools.TestCorrelateObject) ... ok test_rank3 (test_signaltools.TestCorrelateObject) ... ok test_rank1 (test_signaltools.TestCorrelateSingle) ... ok test_rank3 (test_signaltools.TestCorrelateSingle) ... ok test_signaltools.TestDecimate.test_basic ... crash From mdroe at stsci.edu Mon Oct 19 10:55:43 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Mon, 19 Oct 2009 10:55:43 -0400 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> Message-ID: <4ADC7DEF.704@stsci.edu> I've filed a bug and attached a patch: http://projects.scipy.org/numpy/ticket/1267 No guarantees that I've found all of the alignment issues. I did a grep for "PyObject **" to find possible locations where PyObject * in arrays were being dereferenced. If I could write a unit test to make it fall over on Solaris, then I fixed it, otherwise I left it alone. For example, there are places where misaligned dereferencing is theoretically possible (OBJECT_dot, OBJECT_compare), but a higher level function already did a "BEHAVED" array cast. In those cases I added a unit test so hopefully we'll be able to catch it in the future if the caller no longer ensures well-behavedness. The unit tests are passing with this patch on Sparc (SunOS 5.8), x86 (RHEL 4) and x86_64 (RHEL 4). Those of you who care about less common architectures may want to try the patch out. Since I don't know the alignment requirements of all of the supported platforms, I erred on the side of caution: only x86 and x86_64 will perform unaligned pointer dereferencing -- Everything else will use the slower-but-sure-to-work memcpy approach. That can easily be changed in npy_cpu.h if necessary. Mike Charles R Harris wrote: > > > On Sun, Oct 18, 2009 at 6:04 AM, Michael Droettboom > wrote: > > On 10/16/2009 11:35 PM, Travis Oliphant wrote: > > > > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote: > > > >> I recently committed a regression test and bugfix for object > pointers in > >> record arrays of unaligned size (meaning where each record is not a > >> multiple of sizeof(PyObject **)). > >> > >> For example: > >> > >> a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')]) > >> a2 = np.zeros((10,), 'S10') > >> # This copying would segfault > >> a1['o'] = a2 > >> > >> http://projects.scipy.org/numpy/ticket/1198 > >> > >> Unfortunately, this unit test has opened up a whole hornet's > nest of > >> alignment issues on Solaris. The various reference counting > functions > >> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object > pointers, > >> for instance. Interestingly, there are comments in there saying > >> "handles misaligned data" (eg. line 190), but in fact it > doesn't, and > >> doesn't look to me like it would. But I won't rule out a > mistake in > >> building it on my part. > > > > Thanks for this bug report. It would be very helpful if you > could > > provide the line number where the code is giving a bus error and > > explain why you think the code in question does not handle > misaligned > > data (it still seems like it should to me --- but perhaps I must be > > missing something --- I don't have a Solaris box to test on). > > Perhaps, the real problem is elsewhere (such as other places > where the > > mistake of forgetting about striding needing to be aligned also > before > > pursuing the fast alignment path that you pointed out in another > place > > of code). > > > > This was the thinking for why the code (that I think is in question) > > should handle mis-aligned data: > > > > 1) pointers that are not aligned to the correct size need to be > copied > > to an aligned memory area before being de-referenced. > > 2) static variables defined in a function will be aligned by the C > > compiler. > > > > So, what the code in refcnt.c does is to copy the value in the NumPy > > data-area (i.e. pointed to by it->dataptr) to another memory > location > > (the stack variable temp), dereference it and then increment it's > > reference count. > > > > 196: temp = (PyObject **)it->dataptr; > > 197: Py_XINCREF(*temp); > This is exactly an instance that fails. Let's say we have a > PyObject at > an aligned location 0x4000 (PyObjects themselves always seem to be > aligned -- I strongly suspect CPython is enforcing that). Then, > we can > create a recarray such that some of the PyObject*'s in it are at > unaligned locations. For example, if the dtype is 'O,c', you have a > record stride of 5 which creates unaligned PyObject*'s: > > OOOOcOOOOcOOOOc > 0123456789abcde > ^ ^ > > Now in the code above, let's assume that it->dataptr points to an > unaligned location, 0x8005. Assigning it to temp puts the same > unaligned value in temp, 0x8005. That is: > > &temp == 0x1000 /* The location of temp *is* on the stack and > aligned */ > temp == 0x8005 /* But its value as a pointer points to an unaligned > memory location */ > *temp == 0x4000 /* Dereferencing it should get us back to the > original > PyObject * pointer, but dereferencing an > unaligned memory location > fails with a bus error on Solaris */ > > So the bus error occurs on line 197. > > Note that something like: > > PyObject* temp; > temp = *(PyObject **)it->dataptr; > > would also fail. > > The solution (this is what works for me, though there may be a > better way): > > PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject > **) */ > /* memcpy works byte-by-byte, so can handle an unaligned > assignment */ > memcpy(&temp, it->dataptr, sizeof(PyObject *)); > Py_XINCREF(temp); > > I'm proposing adding a macro which on Intel/AMD would be defined as: > > #define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src)) > > and on alignment-required platforms as: > > #define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src), > sizeof(PyObject *)) > > and it would be used something like: > > COPY_PYOBJECT_PTR(&temp, it->dataptr); > > > This looks right to me, but I'll let Travis sign off on it. > > > > Chuck > > ------------------------------------------------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From josef.pktd at gmail.com Mon Oct 19 11:26:04 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 19 Oct 2009 11:26:04 -0400 Subject: [Numpy-discussion] numpy build/installation problems ? In-Reply-To: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com> References: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com> Message-ID: <1cd32cbb0910190826s5dd5aa3ftf1f8c5039d145256@mail.gmail.com> On Mon, Oct 19, 2009 at 10:40 AM, wrote: > I wanted to finally upgrade my numpy, so I can build scipy trunk > again, but I get test failures with numpy. And running the tests of > the previously compiled version of scipy crashes in signaltools. > > Is this a problem with my build (the usual official MingW on > WindowsXP), or are there still ABI problems in numpy trunk? > I did the build twice with (I think) clean directories and get the same result. > > Thanks, > > Josef Forgot to mention my previous version of scipy was build against numpy release 1.3.0 I recompiled scipy, and have no problems building and running scipy trunk against numpy trunk. One problem I had, was that during the build of scipy, gcc failed with unknown npymath. I had to copy the file libnpymath.a to my Python libs directory, then the build finished without problems. Josef From pgmdevlist at gmail.com Mon Oct 19 11:43:59 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 19 Oct 2009 11:43:59 -0400 Subject: [Numpy-discussion] numpy build/installation problems ? In-Reply-To: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com> References: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com> Message-ID: On Oct 19, 2009, at 10:40 AM, josef.pktd at gmail.com wrote: > I wanted to finally upgrade my numpy, so I can build scipy trunk > again, but I get test failures with numpy. And running the tests of > the previously compiled version of scipy crashes in signaltools. The ConversionWarnings are expected. I'm probably to be blamed for the AttributeErrors (I'm testing on 2.6 where tuples do have an index Attribute), I gonna check that. From gnurser at googlemail.com Mon Oct 19 12:01:52 2009 From: gnurser at googlemail.com (George Nurser) Date: Mon, 19 Oct 2009 17:01:52 +0100 Subject: [Numpy-discussion] numpy build/installation problems ? In-Reply-To: References: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com> Message-ID: <1d1e6ea70910190901g39fe5fehd30a9f49806a0fd6@mail.gmail.com> I had the same 4 errors in genfromtext yesterday when I upgraded numpy r 7539. mac os x python 2.5.2. --George. 2009/10/19 Pierre GM : > > On Oct 19, 2009, at 10:40 AM, josef.pktd at gmail.com wrote: > >> I wanted to finally upgrade my numpy, so I can build scipy trunk >> again, but I get test failures with numpy. And running the tests of >> the previously compiled version of scipy crashes in signaltools. > > The ConversionWarnings are expected. I'm probably to be blamed for the > AttributeErrors (I'm testing on 2.6 where tuples do have an index > Attribute), I gonna check that. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pgmdevlist at gmail.com Mon Oct 19 12:26:48 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 19 Oct 2009 12:26:48 -0400 Subject: [Numpy-discussion] numpy build/installation problems ? In-Reply-To: <1d1e6ea70910190901g39fe5fehd30a9f49806a0fd6@mail.gmail.com> References: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com> <1d1e6ea70910190901g39fe5fehd30a9f49806a0fd6@mail.gmail.com> Message-ID: <23EF0D4B-433C-408F-B69D-7D8105853B37@gmail.com> On Oct 19, 2009, at 12:01 PM, George Nurser wrote: > I had the same 4 errors in genfromtext yesterday when I upgraded > numpy r 7539. > mac os x python 2.5.2. I'm on it, should be fixed in a few hours. Please, don't hesitate to open a ticket next time (so that I remember to test on 2.5 as well...). Thx From gokhansever at gmail.com Mon Oct 19 12:32:18 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Mon, 19 Oct 2009 11:32:18 -0500 Subject: [Numpy-discussion] Multiple string formatting while writing an array into a file In-Reply-To: <49d6b3500910181003w56dae6a5h76269d110e71f22e@mail.gmail.com> References: <49d6b3500910181003w56dae6a5h76269d110e71f22e@mail.gmail.com> Message-ID: <49d6b3500910190932w4dced851s36313a294dce787b@mail.gmail.com> On Sun, Oct 18, 2009 at 12:03 PM, G?khan Sever wrote: > Hello, > > I have a relatively simple question which I couldn't figure out myself yet. > I have an array that I am writing into a file using the following savetxt > method. > > np.savetxt(fid, output_array, fmt='%12.4f', delimiter='') > > > However, I have made some changes on the code and I require to write after > 7th element of the array as integer instead of 12.4 formatted float. The > change below doesn't help me to solve the problem since I get a "ValueError: > setting an array element with a sequence." > > np.savetxt(fid, (output_array[:7], output_array[7:]), fmt=('%12.4f', > '%12d'), delimiter='') > > What would be the right approach to fix this issue? > > Thanks. > > -- > G?khan > Pre-defining a format like shown below, seemingly help me to fix: I[48]: format="" I[49]: for i in range(len(output_array)): if i<7: format += "%12.4f " else: format += "%12d " np.savetxt(fid, output_array, fmt=format) However couldn't figure out to make it work in-place. From the savetxtdocumentation: *fmt* : str or sequence of strs A single format (%10.5f), a sequence of formats, or a multi-format string, e.g. ?Iteration %d ? %10.5f?, in which case *delimiter* is ignored Any ideas how to make this work via in-place iteration? I could add an example to the function doc once I learn how to do this. Thanks. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagmarwi at gmail.com Mon Oct 19 13:00:15 2009 From: dagmarwi at gmail.com (dagmar wismeijer) Date: Mon, 19 Oct 2009 19:00:15 +0200 Subject: [Numpy-discussion] opening pickled numarray data with numpy Message-ID: <87701dd70910191000n7110c024i2809bb7f97a498fa@mail.gmail.com> Hi, I've been trying to open (using numpy) old pickled data files that I once created using numarray, but I keep getting the message that there is no module numarray.generic. Is there any way I could open these datafiles without installing numarray again? Thanks in advance, Dagmar -------------- next part -------------- An HTML attachment was scrubbed... URL: From v.for.vandal at gmail.com Mon Oct 19 15:55:22 2009 From: v.for.vandal at gmail.com (Artem Serebriyskiy) Date: Mon, 19 Oct 2009 23:55:22 +0400 Subject: [Numpy-discussion] user defined types Message-ID: <75d4f97a0910191255s1cb55645laadb26ccde850c7a@mail.gmail.com> Hello! Would you please give me some examples of open source projects which use the implementation of user defined types for numpy library? (implementation on the C-API level) -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Oct 19 16:29:32 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 19 Oct 2009 15:29:32 -0500 Subject: [Numpy-discussion] user defined types In-Reply-To: <75d4f97a0910191255s1cb55645laadb26ccde850c7a@mail.gmail.com> References: <75d4f97a0910191255s1cb55645laadb26ccde850c7a@mail.gmail.com> Message-ID: <3d375d730910191329p32cba260sd512fbcd9ce8a6b@mail.gmail.com> On Mon, Oct 19, 2009 at 14:55, Artem Serebriyskiy wrote: > Hello! Would you please give me some examples of open source projects which > use the implementation of user defined types for numpy library? > (implementation on the C-API level) I'm not sure that anyone currently does. We do have an example in doc/newdtype_example/. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at enthought.com Mon Oct 19 17:55:17 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Mon, 19 Oct 2009 16:55:17 -0500 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <4ADC7DEF.704@stsci.edu> References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> <4ADC7DEF.704@stsci.edu> Message-ID: <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> On Oct 19, 2009, at 9:55 AM, Michael Droettboom wrote: > I've filed a bug and attached a patch: > > http://projects.scipy.org/numpy/ticket/1267 > > No guarantees that I've found all of the alignment issues. I did a > grep > for "PyObject **" to find possible locations where PyObject * in > arrays > were being dereferenced. If I could write a unit test to make it fall > over on Solaris, then I fixed it, otherwise I left it alone. For > example, there are places where misaligned dereferencing is > theoretically possible (OBJECT_dot, OBJECT_compare), but a higher > level > function already did a "BEHAVED" array cast. In those cases I added a > unit test so hopefully we'll be able to catch it in the future if the > caller no longer ensures well-behavedness. This patch looks great technically. Thank you for tracking this down and correcting my error. Right now, though, the patch has too many white-space only changes in it. Could you submit a new patch that removes those changes? Thanks, -Travis -- Travis Oliphant Enthought Inc. 1-512-536-1057 http://www.enthought.com oliphant at enthought.com From charlesr.harris at gmail.com Mon Oct 19 18:28:16 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 19 Oct 2009 16:28:16 -0600 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> <4ADC7DEF.704@stsci.edu> <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> Message-ID: On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant wrote: > > On Oct 19, 2009, at 9:55 AM, Michael Droettboom wrote: > > > I've filed a bug and attached a patch: > > > > http://projects.scipy.org/numpy/ticket/1267 > > > > No guarantees that I've found all of the alignment issues. I did a > > grep > > for "PyObject **" to find possible locations where PyObject * in > > arrays > > were being dereferenced. If I could write a unit test to make it fall > > over on Solaris, then I fixed it, otherwise I left it alone. For > > example, there are places where misaligned dereferencing is > > theoretically possible (OBJECT_dot, OBJECT_compare), but a higher > > level > > function already did a "BEHAVED" array cast. In those cases I added a > > unit test so hopefully we'll be able to catch it in the future if the > > caller no longer ensures well-behavedness. > > > This patch looks great technically. Thank you for tracking this down > and correcting my error. > > Right now, though, the patch has too many white-space only changes in > it. Could you submit a new patch that removes those changes? > > The old whitespace is hard tabs and needs to be replaced anyway. The new whitespace doesn't always get the indentation right, however. That file needs a style/whitespace cleanup. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Mon Oct 19 18:29:50 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Mon, 19 Oct 2009 17:29:50 -0500 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: On Oct 17, 2009, at 7:49 AM, Darren Dale wrote: > numpy's functions, especially ufuncs, have had some ability to support > subclasses through the ndarray.__array_wrap__ method, which provides > masked arrays or quantities (for example) with an opportunity to set > the class and metadata of the output array at the end of an operation. > An example is > > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'meters') > numpy.add(q1, q2) # yields Quantity(3, 'meters') > > At SciPy2009 we committed a change to the numpy trunk that provides a > chance to determine the class and some metadata of the output *before* > the ufunc performs its calculation, but after output array has been > established (and its data is still uninitialized). Consider: > > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'J') > numpy.add(q1, q2, q1) > # or equivalently: > # q1 += q2 > > With only __array_wrap__, the attempt to propagate the units happens > after q1's data was updated in place, too late to raise an error, the > data is now corrupted. __array_prepare__ solves that problem, an > exception can be raised in time. > > Now I'd like to suggest one more improvement to numpy to make its > functions more generic. Consider one more example: > > q1 = Quantity(1, 'meter') > q2 = Quantity(2, 'feet') > numpy.add(q1, q2) > > In this case, I'd like an opportunity to operate on the input arrays > on the way in to the ufunc, to rescale the second input to meters. I > think it would be a hack to try to stuff this capability into > __array_prepare__. One form of this particular example is already > supported in quantities, "q1 + q2", by overriding the __add__ method > to rescale the second input, but there are ufuncs that do not have an > associated special method. So I'd like to look into adding another > check for a special method, perhaps called __input_prepare__. My time > is really tight for the next month, so I'd rather not start if there > are strong objections, but otherwise, I'd like to try to try to get it > in in time for numpy-1.4. (Has a timeline been established?) > > I think it will be not too difficult to document this overall scheme: > > When calling numpy functions: > > 1) __input_prepare__ provides an opportunity to operate on the inputs > to yield versions that are compatible with the operation (they should > obviously not be modified in place) > > 2) the output array is established > > 3) __array_prepare__ is used to determine the class of the output > array, as well as any metadata that needs to be established before the > operation proceeds > > 4) the ufunc performs its operations > > 5) __array_wrap__ provides an opportunity to update the output array > based on the results of the computation > > Comments, criticisms? If PEP 3124^ were already a part of the standard > library, that could serve as the basis for generalizing numpy's > functions. But I think the PEP will not be approved in its current > form, and it is unclear when and if the author will revisit the > proposal. The scheme I'm imagining might be sufficient for our > purposes. This seems like it could work. So, basically ufuncs will take any object as input and call it's __input__prepare__ method? This should return a sub-class of an ndarray? -Travis From robert.kern at gmail.com Mon Oct 19 18:36:39 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 19 Oct 2009 17:36:39 -0500 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> <4ADC7DEF.704@stsci.edu> <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> Message-ID: <3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com> On Mon, Oct 19, 2009 at 17:28, Charles R Harris wrote: > > On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant > wrote: >> Right now, though, the patch has too many white-space only changes in >> it. ?Could you submit a new patch that removes those changes? > > The old whitespace is hard tabs and needs to be replaced anyway. The new > whitespace doesn't always get the indentation right, however. That file > needs a style/whitespace cleanup. That's fine, but whitespace cleanup needs to be done in commits that are separate from the functional changes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Mon Oct 19 18:54:16 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 19 Oct 2009 16:54:16 -0600 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com> References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> <4ADC7DEF.704@stsci.edu> <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> <3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com> Message-ID: On Mon, Oct 19, 2009 at 4:36 PM, Robert Kern wrote: > On Mon, Oct 19, 2009 at 17:28, Charles R Harris > wrote: > > > > On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant > > > wrote: > > >> Right now, though, the patch has too many white-space only changes in > >> it. Could you submit a new patch that removes those changes? > > > > The old whitespace is hard tabs and needs to be replaced anyway. The new > > whitespace doesn't always get the indentation right, however. That file > > needs a style/whitespace cleanup. > > That's fine, but whitespace cleanup needs to be done in commits that > are separate from the functional changes. > > I agree, but it can be tricky to preserve hard tabs when your editor uses spaces and has hard tabs set to 8 spaces. That file is on my cleanup list anyway, I'll try to get to it this weekend. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrennie at gmail.com Mon Oct 19 19:51:53 2009 From: jrennie at gmail.com (Jason Rennie) Date: Mon, 19 Oct 2009 19:51:53 -0400 Subject: [Numpy-discussion] opening pickled numarray data with numpy In-Reply-To: <87701dd70910191000n7110c024i2809bb7f97a498fa@mail.gmail.com> References: <87701dd70910191000n7110c024i2809bb7f97a498fa@mail.gmail.com> Message-ID: <75c31b2a0910191651q19932d53kef173130a02cf3c1@mail.gmail.com> Try creating an empty module/class with the given name. I.e. create a 'numarray' dir off your PYTHONPATH, create an empty __init__.py file, create a 'generic.py' file in that dir and populate it with whatever class python complains about like so: #!/usr/bin/env python class MissingClass(object): pass Cheers, Jason On Mon, Oct 19, 2009 at 1:00 PM, dagmar wismeijer wrote: > Hi, > > I've been trying to open (using numpy) old pickled data files that I once > created using numarray, but I keep getting the message that there is no > module numarray.generic. > Is there any way I could open these datafiles without installing numarray > again? > > Thanks in advance, > > Dagmar > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Jason Rennie Research Scientist, ITA Software 617-714-2645 http://www.itasoftware.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Mon Oct 19 23:45:35 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 19 Oct 2009 23:45:35 -0400 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: 2009/10/19 Sebastian Walter : > > I'm all for generic (u)funcs since they might come handy for me since > I'm doing lots of operation on arrays of polynomials. Just as a side note, if you don't mind my asking, what sorts of operations do you do on arrays of polynomials? In a thread on scipy-dev we're discussing improving scipy's polynomial support, and we'd be happy to get some more feedback on what they need to be able to do. Thanks! Anne From sebastian.walter at gmail.com Tue Oct 20 03:21:42 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 20 Oct 2009 09:21:42 +0200 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: On Tue, Oct 20, 2009 at 5:45 AM, Anne Archibald wrote: > 2009/10/19 Sebastian Walter : >> >> I'm all for generic (u)funcs since they might come handy for me since >> I'm doing lots of operation on arrays of polynomials. > > Just as a side note, if you don't mind my asking, what sorts of > operations do you do on arrays of polynomials? In a thread on > scipy-dev we're discussing improving scipy's polynomial support, and > we'd be happy to get some more feedback on what they need to be able > to do. I've been reading (and commenting) that thread ;) I'm doing algorithmic differentiation by computing on truncated Taylor polynomials in the Powerbasis, i.e. always truncating all operation at degree D z(t) = \sum_d=0^{D-1} z_d t^d = x(t) * y(t) = \sum_{d=0}^{D-1} \sum_{k=0}^d x_k * y_{d-k} + O(t^D) Using other bases does not make sense in my case since the truncation of all terms of higher degree than t^D has afaik no good counterpart for bases like chebycheff. On the other hand, I need to be generic in the coefficients, e.g. z_d from above could be a tensor of any shape, e.g. a matrix. Typical workcase when I need to perform operations on arrays of polynomials is best explained in a talk I gave earlier this year: http://github.com/b45ch1/pyadolc/raw/master/doc/walter_talk_algorithmic_differentiation_in_python_with_pyadolc_pycppad_algopy.pdf on slide 7 and 8. (the class adouble "is" a Taylor polynomial). > > Thanks! > Anne > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From markus.proeller at ifm.com Tue Oct 20 05:17:47 2009 From: markus.proeller at ifm.com (markus.proeller at ifm.com) Date: Tue, 20 Oct 2009 11:17:47 +0200 Subject: [Numpy-discussion] why does binary_repr don't support arrays Message-ID: Hello, I'm always wondering why binary_repr doesn't allow arrays as input values. I always have to use a work around like: import numpy as np def binary_repr(arr, width=None): binary_list = map((lambda foo: np.binary_repr(foo, width)), arr.flatten()) str_len_max = len(np.binary_repr(arr.max(), width=width)) str_len_min = len(np.binary_repr(arr.min(), width=width)) if str_len_max > str_len_min: str_len = str_len_max else: str_len = str_len_min binary_array = np.fromiter(binary_list, dtype='|S'+str(str_len)) return binary_array.reshape(arr.shape) Is there a reason why arrays are not supported or is there another function that does support arrays? Thanks, Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.walter at gmail.com Tue Oct 20 05:24:51 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 20 Oct 2009 11:24:51 +0200 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: I'm not very familiar with the underlying C-API of numpy, so this has to be taken with a grain of salt. The reason why I'm curious about the genericity is that it would be awesome to have: 1) ufuncs like sin, cos, exp... to work on arrays of any object (this works already) 2) funcs like dot, eig, etc, to work on arrays of objects( works for dot already, but not for eig) 3) ufuncs and funcs to work on any objects examples that would be nice to work are among others: * arrays of polynomials, i.e. arrays of objects * polynomials with tensor coefficients, object with underlying array structure I thought that the most elegant way to implement that would be to have all numpy functions try to call either 1) the class function with the same name as the numpy function 2) or if the class function is not implemented, the member function with the same name as the numpy function 3) if none exists, raise an exception E.g. 1) if isinstance(x) = Foo then numpy.sin(x) would call Foo.sin(x) if it doesn't know how to handle Foo 2) similarly, for arrays of objects of type Foo: x = np.array([Foo(1), Foo(2)]) Then numpy.sin(x) should try to return npy.array([Foo.sin(xi) for xi in x]) or in case Foo.sin is not implemented as class function, return : np.array([xi.sin() for xi in x]) Therefore, I somehow expected something like that: Quantity would derive from numpy.ndarray. When calling Quantity.__new__(cls) creates the member functions __add__, __imul__, sin, exp, ... where each function has a preprocessing part and a post processing part. After the preprocessing call the original ufuncs on the base class object, e.g. __add__ Sebastian On Mon, Oct 19, 2009 at 1:55 PM, Darren Dale wrote: > On Mon, Oct 19, 2009 at 3:10 AM, Sebastian Walter > wrote: >> On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale wrote: >>> numpy's functions, especially ufuncs, have had some ability to support >>> subclasses through the ndarray.__array_wrap__ method, which provides >>> masked arrays or quantities (for example) with an opportunity to set >>> the class and metadata of the output array at the end of an operation. >>> An example is >>> >>> q1 = Quantity(1, 'meter') >>> q2 = Quantity(2, 'meters') >>> numpy.add(q1, q2) # yields Quantity(3, 'meters') >>> >>> At SciPy2009 we committed a change to the numpy trunk that provides a >>> chance to determine the class and some metadata of the output *before* >>> the ufunc performs its calculation, but after output array has been >>> established (and its data is still uninitialized). Consider: >>> >>> q1 = Quantity(1, 'meter') >>> q2 = Quantity(2, 'J') >>> numpy.add(q1, q2, q1) >>> # or equivalently: >>> # q1 += q2 >>> >>> With only __array_wrap__, the attempt to propagate the units happens >>> after q1's data was updated in place, too late to raise an error, the >>> data is now corrupted. __array_prepare__ solves that problem, an >>> exception can be raised in time. >>> >>> Now I'd like to suggest one more improvement to numpy to make its >>> functions more generic. Consider one more example: >>> >>> q1 = Quantity(1, 'meter') >>> q2 = Quantity(2, 'feet') >>> numpy.add(q1, q2) >>> >>> In this case, I'd like an opportunity to operate on the input arrays >>> on the way in to the ufunc, to rescale the second input to meters. I >>> think it would be a hack to try to stuff this capability into >>> __array_prepare__. One form of this particular example is already >>> supported in quantities, "q1 + q2", by overriding the __add__ method >>> to rescale the second input, but there are ufuncs that do not have an >>> associated special method. So I'd like to look into adding another >>> check for a special method, perhaps called __input_prepare__. My time >>> is really tight for the next month, so I'd rather not start if there >>> are strong objections, but otherwise, I'd like to try to try to get it >>> in in time for numpy-1.4. (Has a timeline been established?) >>> >>> I think it will be not too difficult to document this overall scheme: >>> >>> When calling numpy functions: >>> >>> 1) __input_prepare__ provides an opportunity to operate on the inputs >>> to yield versions that are compatible with the operation (they should >>> obviously not be modified in place) >>> >>> 2) the output array is established >>> >>> 3) __array_prepare__ is used to determine the class of the output >>> array, as well as any metadata that needs to be established before the >>> operation proceeds >>> >>> 4) the ufunc performs its operations >>> >>> 5) __array_wrap__ provides an opportunity to update the output array >>> based on the results of the computation >>> >>> Comments, criticisms? If PEP 3124^ were already a part of the standard >>> library, that could serve as the basis for generalizing numpy's >>> functions. But I think the PEP will not be approved in its current >>> form, and it is unclear when and if the author will revisit the >>> proposal. The scheme I'm imagining might be sufficient for our >>> purposes. >> >> I'm all for generic (u)funcs since they might come handy for me since >> I'm doing lots of operation on arrays of polynomials. >> ?I don't quite get the reasoning though. >> Could you correct me where I get it wrong? >> * the class Quantity derives from numpy.ndarray >> * Quantity overrides __add__, __mul__ etc. and you get the correct behaviour for >> q1 = Quantity(1, 'meter') >> q2 = Quantity(2, 'J') >> by raising an exception when performing q1+=q2 > > No, Quantity does not override __iadd__ to catch this. Quantity > implements __array_prepare__ to perform the dimensional analysis based > on the identity of the ufunc and the inputs, and set the class and > dimensionality of the output array, or raise an error when dimensional > analysis fails. This approach lets quantities support all ufuncs (in > principle), not just built in numerical operations. It should also > make it easier to subclass from MaskedArray, so we could have a > MaskedQuantity without having to establish yet another suite of ufuncs > specific to quantities or masked quantities. > >> * The problem is that numpy.add(q1,q1,q2) would corrupt q1 before >> raising an exception > > That was solved by the addition of __array_prepare__ to numpy back in > August. What I am proposing now is supporting operations on arrays > that would be compatible if we had a chance to transform them on the > way into the ufunc, like "meter + foot". > > Darren > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From dsdale24 at gmail.com Tue Oct 20 07:46:42 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 20 Oct 2009 07:46:42 -0400 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: On Tue, Oct 20, 2009 at 5:24 AM, Sebastian Walter wrote: > I'm not very familiar with the underlying C-API of numpy, so this has > to be taken with a grain of salt. > > The reason why I'm curious about the genericity is that it would be > awesome to have: > 1) ufuncs like sin, cos, exp... to work on arrays of any object (this > works already) > 2) funcs like dot, eig, etc, to work on arrays of objects( works for > dot already, but not for eig) > 3) ufuncs and funcs to work on any objects I think if you want to work on any object, you need something like the PEP I mentioned earlier. What I am proposing is to use the existing mechanism in numpy, check __array_priority__ to determine which input's __input_prepare__ to call. > examples that would be nice to work are among others: > * arrays of polynomials, i.e. arrays of objects > * polynomials with tensor coefficients, object with underlying array structure > > I thought that the most elegant way to implement that would be to have > all numpy functions try ?to call either > 1) ?the class function with the same name as the numpy function > 2) or if the class function is not implemented, the member function > with the same name as the numpy function > 3) if none exists, raise an exception > > E.g. > > 1) > if isinstance(x) = Foo > then numpy.sin(x) > would call Foo.sin(x) if it doesn't know how to handle Foo How does it numpy.sin know if it knows how to handle Foo? numpy.sin will happily process the data of subclasses of ndarray, but if you give it a quantity with units of degrees it is going to return garbage and not care. > 2) > similarly, for arrays of objects of type Foo: > ?x = np.array([Foo(1), Foo(2)]) > > Then numpy.sin(x) > should try to return npy.array([Foo.sin(xi) for xi in x]) > or in case Foo.sin is not implemented as class function, > return : np.array([xi.sin() for xi in x]) I'm not going to comment on this, except to say that it is outside the scope of my proposal. > Therefore, I somehow expected something like that: > Quantity would derive from numpy.ndarray. > When calling ?Quantity.__new__(cls) creates the member functions > __add__, __imul__, sin, exp, ... > where each function has a preprocessing part and a post processing part. > After the preprocessing call the original ufuncs on the base class > object, e.g. __add__ It is more complicated than that. Ufuncs don't call array methods, its the other way around. ndarray.__add__ calls numpy.add. If you have a custom operation to perform on numpy arrays, you write a ufunc, not a subclass. What you are proposing is a very significant change to numpy. Darren From dsdale24 at gmail.com Tue Oct 20 08:04:19 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 20 Oct 2009 08:04:19 -0400 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: Hi Travis, On Mon, Oct 19, 2009 at 6:29 PM, Travis Oliphant wrote: > > On Oct 17, 2009, at 7:49 AM, Darren Dale wrote: [...] >> When calling numpy functions: >> >> 1) __input_prepare__ provides an opportunity to operate on the inputs >> to yield versions that are compatible with the operation (they should >> obviously not be modified in place) >> >> 2) the output array is established >> >> 3) __array_prepare__ is used to determine the class of the output >> array, as well as any metadata that needs to be established before the >> operation proceeds >> >> 4) the ufunc performs its operations >> >> 5) __array_wrap__ provides an opportunity to update the output array >> based on the results of the computation >> >> Comments, criticisms? If PEP 3124^ were already a part of the standard >> library, that could serve as the basis for generalizing numpy's >> functions. But I think the PEP will not be approved in its current >> form, and it is unclear when and if the author will revisit the >> proposal. The scheme I'm imagining might be sufficient for our >> purposes. > > This seems like it could work. ? ?So, basically ufuncs will take any > object as input and call it's __input__prepare__ method? ? This should > return a sub-class of an ndarray? ufuncs would call __input_prepare__ on the input declaring the highest __array_priority__, just like ufuncs do with __array_wrap__, passing a tuple of inputs and the ufunc itself (provided for context). __input_prepare__ would return a tuple of inputs that the ufunc would use for computation, I'm not sure if these need to be arrays or not, I think I can give a better answer once I start the implementation (next few days I think). Darren From mdroe at stsci.edu Tue Oct 20 09:02:46 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Tue, 20 Oct 2009 09:02:46 -0400 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> <4ADC7DEF.704@stsci.edu> <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> <3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com> Message-ID: <4ADDB4F6.3020700@stsci.edu> I've resubmitted the patch without whitespace-only changes. For what it's worth, I had followed the directions here: http://projects.scipy.org/numpy/wiki/EmacsSetup which say to perform "untabify" and "whitespace-cleanup". Are those not current? I had added these to my pre-save hooks under my numpy tree. Cheers, Mike Charles R Harris wrote: > > > On Mon, Oct 19, 2009 at 4:36 PM, Robert Kern > wrote: > > On Mon, Oct 19, 2009 at 17:28, Charles R Harris > > wrote: > > > > On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant > > > > wrote: > > >> Right now, though, the patch has too many white-space only > changes in > >> it. Could you submit a new patch that removes those changes? > > > > The old whitespace is hard tabs and needs to be replaced anyway. > The new > > whitespace doesn't always get the indentation right, however. > That file > > needs a style/whitespace cleanup. > > That's fine, but whitespace cleanup needs to be done in commits that > are separate from the functional changes. > > > I agree, but it can be tricky to preserve hard tabs when your editor > uses spaces and has hard tabs set to 8 spaces. That file is on my > cleanup list anyway, I'll try to get to it this weekend. > > Chuck > > ------------------------------------------------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From peridot.faceted at gmail.com Tue Oct 20 11:04:23 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 20 Oct 2009 11:04:23 -0400 Subject: [Numpy-discussion] Another suggestion for making numpy's functions generic In-Reply-To: References: Message-ID: 2009/10/20 Sebastian Walter : > On Tue, Oct 20, 2009 at 5:45 AM, Anne Archibald > wrote: >> 2009/10/19 Sebastian Walter : >>> >>> I'm all for generic (u)funcs since they might come handy for me since >>> I'm doing lots of operation on arrays of polynomials. >> >> Just as a side note, if you don't mind my asking, what sorts of >> operations do you do on arrays of polynomials? In a thread on >> scipy-dev we're discussing improving scipy's polynomial support, and >> we'd be happy to get some more feedback on what they need to be able >> to do. > > I've been reading (and commenting) that thread ;) > ?I'm doing algorithmic differentiation by computing on truncated > Taylor polynomials in the Powerbasis, > ?i.e. always truncating all operation at degree D > z(t) = \sum_d=0^{D-1} z_d t^d = ?x(t) * y(t) = \sum_{d=0}^{D-1} > \sum_{k=0}^d x_k * y_{d-k} + O(t^D) > > Using other bases does not make sense in my case since the truncation > of all terms of higher degree than t^D > has afaik no good counterpart for bases like chebycheff. > On the other hand, I need to be generic in the coefficients, e.g. > z_d from above could be a tensor of any shape, ?e.g. ?a matrix. In fact, truncating at degree D for Chebyshev polynomials works exactly the same way as it does for power polynomials, and if what you care about is function approximation, it has much nicer behaviour. But if what you care about is really truncated Taylor polynomials, there's no beating the power basis. I realize that arrays of polynomial objects are handy from a bookkeeping point of view, but how does an array of polynomials, say of shape (N,), differ from a single polynomial with coefficients of shape (N,)? I think we need to provide the latter in our polynomial classes, but as you point out, getting ufunc support for the former is nontrivial. Anne From charlesr.harris at gmail.com Tue Oct 20 12:13:53 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 20 Oct 2009 10:13:53 -0600 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: <4ADDB4F6.3020700@stsci.edu> References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> <4ADC7DEF.704@stsci.edu> <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> <3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com> <4ADDB4F6.3020700@stsci.edu> Message-ID: On Tue, Oct 20, 2009 at 7:02 AM, Michael Droettboom wrote: > I've resubmitted the patch without whitespace-only changes. > > For what it's worth, I had followed the directions here: > > http://projects.scipy.org/numpy/wiki/EmacsSetup > > which say to perform "untabify" and "whitespace-cleanup". Are those not > current? I had added these to my pre-save hooks under my numpy tree. > > The problem is that hard tabs have crept into the file. The strict approach in this case is to make two patches: the first cleans up the hard tabs, the second fixes the problems. How about I fix up the hard tabs and then you can make another patch? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Oct 20 13:16:18 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 20 Oct 2009 13:16:18 -0400 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: <4ADAE897.1070000@bigpond.net.au> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> Message-ID: <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com> On Sun, Oct 18, 2009 at 6:06 AM, Gary Ruben wrote: > Hi Ga?l, > > If you've got a 1D array/vector called "a", I think the normal idiom is > > np.dot(a,a) > > For the more general case, I think > np.tensordot(a, a, axes=something_else) > should do it, where you should be able to figure out something_else for > your particular case. Is it really possible to get the same as np.sum(a*a, axis) with tensordot if a.ndim=2 ? Any way I try the "something_else", I get extra terms as in np.dot(a.T, a) Josef > > Gary R. > > Gael Varoquaux wrote: >> On Sat, Oct 17, 2009 at 07:27:55PM -0400, josef.pktd at gmail.com wrote: >>>>>> Why aren't you using logaddexp ufunc from numpy? >> >>>>> Maybe because it is difficult to find, it doesn't have its own docs entry. >> >> Speaking of which... >> >> I thought that there was a readily-written, optimized function (or ufunc) >> in numpy or scipy that calculated the sum of squares for an array >> (possibly along an axis). However, I cannot find it. >> >> Is there something similar? If not, it is not the end of the world, the >> operation is trivial to write. >> >> Cheers, >> >> Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From peridot.faceted at gmail.com Tue Oct 20 15:09:36 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 20 Oct 2009 15:09:36 -0400 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com> Message-ID: 2009/10/20 : > On Sun, Oct 18, 2009 at 6:06 AM, Gary Ruben wrote: >> Hi Ga?l, >> >> If you've got a 1D array/vector called "a", I think the normal idiom is >> >> np.dot(a,a) >> >> For the more general case, I think >> np.tensordot(a, a, axes=something_else) >> should do it, where you should be able to figure out something_else for >> your particular case. > > Is it really possible to get the same as np.sum(a*a, axis) with > tensordot if a.ndim=2 ? > Any way I try the "something_else", I get extra terms as in np.dot(a.T, a) It seems like this would be a good place to apply numpy's higher-dimensional ufuncs: what you want seems to just be the vector inner product, broadcast over all other dimensions. In fact I believe this is implemented in numpy as a demo: numpy.umath_tests.inner1d should do the job. Anne From josef.pktd at gmail.com Tue Oct 20 15:28:45 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 20 Oct 2009 15:28:45 -0400 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com> Message-ID: <1cd32cbb0910201228t32d4be5agff1a845474481b68@mail.gmail.com> On Tue, Oct 20, 2009 at 3:09 PM, Anne Archibald wrote: > 2009/10/20 ?: >> On Sun, Oct 18, 2009 at 6:06 AM, Gary Ruben wrote: >>> Hi Ga?l, >>> >>> If you've got a 1D array/vector called "a", I think the normal idiom is >>> >>> np.dot(a,a) >>> >>> For the more general case, I think >>> np.tensordot(a, a, axes=something_else) >>> should do it, where you should be able to figure out something_else for >>> your particular case. >> >> Is it really possible to get the same as np.sum(a*a, axis) ?with >> tensordot ?if a.ndim=2 ? >> Any way I try the "something_else", I get extra terms as in np.dot(a.T, a) > > It seems like this would be a good place to apply numpy's > higher-dimensional ufuncs: what you want seems to just be the vector > inner product, broadcast over all other dimensions. In fact I believe > this is implemented in numpy as a demo: numpy.umath_tests.inner1d > should do the job. Thanks, this works well, needs core in name (I might have to learn how to swap or roll axis to use this for more than 2d.) >>> np.core.umath_tests.inner1d(a.T, b.T) array([12, 8, 16]) >>> (a*b).sum(0) array([12, 8, 16]) >>> np.core.umath_tests.inner1d(a.T, b.T) array([12, 8, 16]) >>> (a*a).sum(0) array([126, 166, 214]) >>> np.core.umath_tests.inner1d(a.T, a.T) array([126, 166, 214]) What's the status on these functions? They don't show up in the docs or help, except for a brief mention in the c-api: http://docs.scipy.org/numpy/docs/numpy-docs/reference/c-api.generalized-ufuncs.rst/ Are they for public consumption and should go into the docs? Or do they remain a hidden secret, to force users to read the mailing lists? Josef > > Anne > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Tue Oct 20 16:05:22 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 20 Oct 2009 14:05:22 -0600 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> <4ADC7DEF.704@stsci.edu> <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> <3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com> <4ADDB4F6.3020700@stsci.edu> Message-ID: On Tue, Oct 20, 2009 at 10:13 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Tue, Oct 20, 2009 at 7:02 AM, Michael Droettboom wrote: > >> I've resubmitted the patch without whitespace-only changes. >> >> For what it's worth, I had followed the directions here: >> >> http://projects.scipy.org/numpy/wiki/EmacsSetup >> >> which say to perform "untabify" and "whitespace-cleanup". Are those not >> current? I had added these to my pre-save hooks under my numpy tree. >> >> > The problem is that hard tabs have crept into the file. The strict approach > in this case is to make two patches: the first cleans up the hard tabs, the > second fixes the problems. > > How about I fix up the hard tabs and then you can make another patch? > > I applied the patch. Can you test it? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdroe at stsci.edu Tue Oct 20 16:22:02 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Tue, 20 Oct 2009 16:22:02 -0400 Subject: [Numpy-discussion] object array alignment issues In-Reply-To: References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu> <4ADC7DEF.704@stsci.edu> <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> <3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com> <4ADDB4F6.3020700@stsci.edu> Message-ID: <4ADE1BEA.8060207@stsci.edu> Thanks. It's passing the related unit tests on Sparc SunOS 5, and Linux x86. Cheers, Mike Charles R Harris wrote: > > > On Tue, Oct 20, 2009 at 10:13 AM, Charles R Harris > > wrote: > > > > On Tue, Oct 20, 2009 at 7:02 AM, Michael Droettboom > > wrote: > > I've resubmitted the patch without whitespace-only changes. > > For what it's worth, I had followed the directions here: > > http://projects.scipy.org/numpy/wiki/EmacsSetup > > which say to perform "untabify" and "whitespace-cleanup". Are > those not > current? I had added these to my pre-save hooks under my > numpy tree. > > > The problem is that hard tabs have crept into the file. The strict > approach in this case is to make two patches: the first cleans up > the hard tabs, the second fixes the problems. > > How about I fix up the hard tabs and then you can make another patch? > > > I applied the patch. Can you test it? > > Chuck > > ------------------------------------------------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From peridot.faceted at gmail.com Tue Oct 20 18:59:42 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 20 Oct 2009 18:59:42 -0400 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: <1cd32cbb0910201228t32d4be5agff1a845474481b68@mail.gmail.com> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com> <1cd32cbb0910201228t32d4be5agff1a845474481b68@mail.gmail.com> Message-ID: 2009/10/20 : > On Tue, Oct 20, 2009 at 3:09 PM, Anne Archibald > wrote: >> 2009/10/20 : >>> On Sun, Oct 18, 2009 at 6:06 AM, Gary Ruben wrote: >>>> Hi Ga?l, >>>> >>>> If you've got a 1D array/vector called "a", I think the normal idiom is >>>> >>>> np.dot(a,a) >>>> >>>> For the more general case, I think >>>> np.tensordot(a, a, axes=something_else) >>>> should do it, where you should be able to figure out something_else for >>>> your particular case. >>> >>> Is it really possible to get the same as np.sum(a*a, axis) with >>> tensordot if a.ndim=2 ? >>> Any way I try the "something_else", I get extra terms as in np.dot(a.T, a) >> >> It seems like this would be a good place to apply numpy's >> higher-dimensional ufuncs: what you want seems to just be the vector >> inner product, broadcast over all other dimensions. In fact I believe >> this is implemented in numpy as a demo: numpy.umath_tests.inner1d >> should do the job. > > Thanks, this works well, needs core in name > (I might have to learn how to swap or roll axis to use this for more than 2d.) > >>>> np.core.umath_tests.inner1d(a.T, b.T) > array([12, 8, 16]) >>>> (a*b).sum(0) > array([12, 8, 16]) >>>> np.core.umath_tests.inner1d(a.T, b.T) > array([12, 8, 16]) >>>> (a*a).sum(0) > array([126, 166, 214]) >>>> np.core.umath_tests.inner1d(a.T, a.T) > array([126, 166, 214]) > > > What's the status on these functions? They don't show up in the docs > or help, except for > a brief mention in the c-api: > > http://docs.scipy.org/numpy/docs/numpy-docs/reference/c-api.generalized-ufuncs.rst/ > > Are they for public consumption and should go into the docs? > Or do they remain a hidden secret, to force users to read the mailing lists? I think the long-term goal is to have a completely ufuncized linear algebra library, and I think these functions are just tests of the gufunc features. In principle, at least, it wouldn't actually be too hard to fill out a full linear algebra library, since the per "element" linear algebra operations already exist. Unfortunately the code should exist for many data types, and the code generator scheme currently used to do this for ordinary ufuncs is a barrier to contributions. It might be worth banging out a doubles-only generic ufunc linear algebra library (in addition to numpy.linalg/scipy.linalg), just as a proof of concept. Anne From mathieu at mblondel.org Wed Oct 21 01:44:39 2009 From: mathieu at mblondel.org (Mathieu Blondel) Date: Wed, 21 Oct 2009 14:44:39 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy Message-ID: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> Hello, About one year ago, a high-level, objected-oriented SIMD API was added to Mono. For example, there is a class Vector4f for vectors of 4 floats and this class implements methods such as basic operators, bitwise operators, comparison operators, min, max, sqrt, shuffle directly using SIMD operations. You can have a look at the following pages for further details: http://tirania.org/blog/archive/2008/Nov-03.html (blog post) http://go-mono.com/docs/index.aspx?tlink=0 at N%3aMono.Simd (API reference) It seems to me that such an API would possibly be a great fit in Numpy too. It would also be possible to add classes that don't directly map to SIMD types. For example, Vector8f can easily be implemented in terms of 2 Vector4f. In addition to vectors, additional API may be added to support operations on matrices of fixed width or height. I search the archives for similar discussions but I only found a discussion about memory-alignment so I hope I am not restarting an existing discussion here. Memory-alignment is an import related issue since non-aligned movs can tank the performance. Any thoughts? I don't know the Numpy code base yet but I'm willing to help if such an effort is started. Thanks, Mathieu From charlesr.harris at gmail.com Wed Oct 21 03:13:11 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 21 Oct 2009 01:13:11 -0600 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> Message-ID: On Tue, Oct 20, 2009 at 11:44 PM, Mathieu Blondel wrote: > Hello, > > About one year ago, a high-level, objected-oriented SIMD API was added > to Mono. For example, there is a class Vector4f for vectors of 4 > floats and this class implements methods such as basic operators, > bitwise operators, comparison operators, min, max, sqrt, shuffle > directly using SIMD operations. You can have a look at the following > pages for further details: > > http://tirania.org/blog/archive/2008/Nov-03.html (blog post) > http://go-mono.com/docs/index.aspx?tlink=0 at N%3aMono.Simd (API reference) > > It seems to me that such an API would possibly be a great fit in Numpy > too. It would also be possible to add classes that don't directly map > to SIMD types. For example, Vector8f can easily be implemented in > terms of 2 Vector4f. In addition to vectors, additional API may be > added to support operations on matrices of fixed width or height. > > I search the archives for similar discussions but I only found a > discussion about memory-alignment so I hope I am not restarting an > existing discussion here. Memory-alignment is an import related issue > since non-aligned movs can tank the performance. > > Any thoughts? I don't know the Numpy code base yet but I'm willing to > help if such an effort is started. > > The licenses look all hodge-podge: - The C# compiler is dual-licensed under the MIT/X11 license and the GNU General Public License (*http://www.opensource.org/licenses/gpl-license.html*) (GPL). - The tools are released under the terms of the GNU General Public License (* http://www.opensource.org/licenses/gpl-license.html*) (GPL). - The runtime libraries are under the GNU Library GPL 2.0 (*http://www.gnu.org/copyleft/library.html#TOC1*) (LGPL 2.0). - The class libraries are released under the terms of the MIT X11 (*http://www.opensource.org/licenses/mit-license.html*) license. - ASP.NET MVC and ASP.NET AJAX client software are released by Microsoft under the open source Microsoft Permissive License (*http://www.opensource.org/licenses/ms-pl.html*). However, if the good stuff is in the class libraries, that looks OK. But that still leaves it in C#, no? You could have a looksie to see how it would fit into, say, Cython. I don't know where it would go in numpy, maybe some of the vector bits would be suitable for some generalized ufuncs. Apart from that, I believe ATLAS can already make use of SIMD, but I have no idea how far it goes in using the full feature set. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Wed Oct 21 02:56:52 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 21 Oct 2009 15:56:52 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> Message-ID: <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> Hi Mathieu, Mathieu Blondel wrote: > Hello, > > About one year ago, a high-level, objected-oriented SIMD API was added > to Mono. For example, there is a class Vector4f for vectors of 4 > floats and this class implements methods such as basic operators, > bitwise operators, comparison operators, min, max, sqrt, shuffle > directly using SIMD operations. You can have a look at the following > pages for further details: > > http://tirania.org/blog/archive/2008/Nov-03.html (blog post) > I am not sure how this could be applied to numpy case ? From what I can understand, this cannot be directly applied to python: the described changes are vm changes, and we cannot do anything at python vm level (I would guess the python vm to be too primitive to implement this kind of things anyway). I don't see how the high level API at the assembly level (Mono.Simd) would work either: the overhead of python and numpy to deal with 4 or 8 items in python would make this API useless from a speed POV. Implementing some numpy internal code in SIMD, and having a 'object oriented' C API for SIMD would indeed be nice - gcc provides SSE intrinsics, as well as visual studio (although the later seems quite buggy if I believe this link: http://www.virtualdub.org/blog/pivot/entry.php?id=162), which would make this in principle relatively easy. This is only my opinion (read other numpy dev may disagree), but I think that the numpy C code should be cleaned up before adding this kind of features: there is still too much coupling between the pure C core and the python machinery. Also, any use of SIMD code should be done at runtime IMHO (so that one binary can be used on multiple architectures), which has some issues on its own from a cross platform POV. David From mathieu at mblondel.org Wed Oct 21 03:29:44 2009 From: mathieu at mblondel.org (Mathieu Blondel) Date: Wed, 21 Oct 2009 16:29:44 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> Message-ID: <7e1472660910210029g6488a64eo102c9ccf7945acfc@mail.gmail.com> > The licenses look all hodge-podge: [...] > However, if the good stuff is in the class libraries, that looks OK. But > that still leaves it in C#, no? I was mentioning Mono just to show that "this has been done" and also their API reference can serve as inspiration to design Numpy's own API. Mathieu From mathieu at mblondel.org Wed Oct 21 03:48:22 2009 From: mathieu at mblondel.org (Mathieu Blondel) Date: Wed, 21 Oct 2009 16:48:22 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> Message-ID: <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> Hi David, On Wed, Oct 21, 2009 at 3:56 PM, David Cournapeau wrote: > I am not sure how this could be applied to numpy case ? From what I can > understand, this cannot be directly applied to python: the described > changes are vm changes, and we cannot do anything at python vm level (I > would guess the python vm to be too primitive to implement this kind of > things anyway). Yes in Mono this is realized with Just-In-Time compilation, so at the VM level. The reason I thought of Numpy rather than Cython is that Python's support for vectors/matrices is limited and Numpy has kind of become the standard for that in the Python world. I saw the video of Peter Norvig at the last Scipy conference who was suggesting to merge Numpy into Cython. The SIMD API would be an argument in favor of this too because of the possible interactions between such a SIMD API and an array API. > I don't see how the high level API at the assembly level (Mono.Simd) > would work either: the overhead of python and numpy to deal with 4 or 8 > items in python would make this API useless from a speed POV. My original idea was to write the code in C with Intel/Alvitec/Neon intrinsics and have this code binded to be able to call it from Python. So the SIMD code would be compiled already, ready to be called from Python. Like you said, there's a risk that the overhead of calling Python is bigger than the benefit of using SIMD instructions. If it's worth trying out, an experiment can be made with Vector4f to see if it's even worth continuing with other types. > This is only my opinion (read other numpy dev may disagree), but I think > that the numpy C code should be cleaned up before adding this kind of > features: there is still too much coupling between the pure C core and > the python machinery. Also, any use of SIMD code should be done at > runtime IMHO (so that one binary can be used on multiple architectures), > which has some issues on its own from a cross platform POV. I recently used SIMD instructions for a project and I realized that they cannot be activated in a standard Debian package, because the package has to remain general-purpose. So people who want to benefit the speed up have to compile my project from source... I also see that sometimes packages are available in different flavors (-msse, -msse2...). Mathieu From pav+sp at iki.fi Wed Oct 21 04:24:09 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Wed, 21 Oct 2009 08:24:09 +0000 (UTC) Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> Message-ID: Wed, 21 Oct 2009 16:48:22 +0900, Mathieu Blondel wrote: [clip] > My original idea was to write the code in C with Intel/Alvitec/Neon > intrinsics and have this code binded to be able to call it from Python. > So the SIMD code would be compiled already, ready to be called from > Python. Like you said, there's a risk that the overhead of calling > Python is bigger than the benefit of using SIMD instructions. If it's > worth trying out, an experiment can be made with Vector4f to see if it's > even worth continuing with other types. The overhead is quickly checked for multiplication with numpy arrays of varying size, without SSE: Overhead per iteration (ms): 1.6264549101 Time per array element (ms): 0.000936947636565 Cross-over point: 1735.90801303 #---------------------------------------------- import numpy as np from scipy import optimize import time import matplotlib.pyplot as plt def main(): data = [] for n in np.unique(np.logspace(0, 5, 20).astype(int)): print n m = 100 reps = 5 times = [] for rep in xrange(reps): x = np.zeros((n,), dtype=np.float_) start = time.time() #------------------ for k in xrange(m): x *= 1.1 #------------------ end = time.time() times.append(end - start) t = min(times) data.append((n, t)) data = np.array(data) def model(z): n, t = data.T overhead, per_elem = z return np.log10(t) - np.log10(overhead + per_elem * n) z, ier = optimize.leastsq(model, [1., 1.]) overhead, per_elem = z print "" print "Overhead per iteration (ms):", overhead*1e3 print "Time per array element (ms):", per_elem*1e3 print "Cross-over point: ", overhead/per_elem n = np.logspace(0, 5, 500) plt.loglog(data[:,0], data[:,0]/data[:,1], 'x', label=r'measured') plt.loglog(n, n/(overhead + per_elem*n), 'k-', label=r'fit to $t = a + b n$') plt.xlabel(r'$n$') plt.ylabel(r'ops/second') plt.grid(1) plt.legend() plt.show() if __name__ == "__main__": main() From david at ar.media.kyoto-u.ac.jp Wed Oct 21 04:05:24 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 21 Oct 2009 17:05:24 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> Message-ID: <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp> Mathieu Blondel wrote: > I saw the video of Peter Norvig at the last Scipy conference who was > suggesting to merge Numpy into Cython. The SIMD API would be an > argument in favor of this too because of the possible interactions > between such a SIMD API and an array API. > Hm, I don't remember this - I guess I would have to look at the video. Do you know at which point of the presentation he discussed about SIMD ? > My original idea was to write the code in C with Intel/Alvitec/Neon > intrinsics and have this code binded to be able to call it from > Python. So the SIMD code would be compiled already, ready to be called > from Python. Like you said, there's a risk that the overhead of > calling Python is bigger than the benefit of using SIMD instructions. > If it's worth trying out, an experiment can be made with Vector4f to > see if it's even worth continuing with other types. > I am quite confident that the overhead will be way too significant for this approach to be useful. If you have two python objects, using + on it will induce at least one function call, and most likely several function calls at the python level. Python function calls are painfully slow (several thousand cycles per call in the most optimistic case). Python overhead is several order of magnitude bigger than what you can earn between SIMD and straightforward C. The only way I can see to make this work is to generate SIMD code from python (which would be a poor man's replacement for a JIT in a way), there was a presentation following this direction at scipy 09 conference. > I recently used SIMD instructions for a project and I realized that > they cannot be activated in a standard Debian package, because the > package has to remain general-purpose. So people who want to benefit > the speed up have to compile my project from source... > Yes - that's unacceptable IMHO. The real solution is to include all the code at build time, detect at *runtime* which ISA is supported, and select the functions accordingly. The problem is that loading shared code at runtime in a cross platform way is complicated - python already does it, but unfortunately does not provide a C API for it AFAIK, so we would have to re-implement it in python. cheers, David From mathieu at mblondel.org Wed Oct 21 04:38:20 2009 From: mathieu at mblondel.org (Mathieu Blondel) Date: Wed, 21 Oct 2009 17:38:20 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp> Message-ID: <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com> On Wed, Oct 21, 2009 at 5:05 PM, David Cournapeau wrote: > Mathieu Blondel wrote: >> I saw the video of Peter Norvig at the last Scipy conference who was >> suggesting to merge Numpy into Cython. The SIMD API would be an >> argument in favor of this too because of the possible interactions >> between such a SIMD API and an array API. >> > > Hm, I don't remember this - I guess I would have to look at the video. > Do you know at which point of the presentation he discussed about SIMD ? Peter Norvig suggested to merge Numpy into Cython but he didn't mention SIMD as the reason (this one is from me). Sorry if I wasn't clear. IIRC, the reason was to help democratize Numpy and make it easier for users to install it. He went on to say that he talked about it with Guido and apparently the main barrier was the release cycle. Please check the video as I'm telling you that from memory. Mathieu From david at ar.media.kyoto-u.ac.jp Wed Oct 21 04:23:58 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 21 Oct 2009 17:23:58 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp> <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com> Message-ID: <4ADEC51E.7080309@ar.media.kyoto-u.ac.jp> Mathieu Blondel wrote: > He went on to say that he talked about > it with Guido and apparently the main barrier was the release cycle. > Please check the video as I'm telling you that from memory. > Ah, I think you are mistaken, then - he referred to merging numpy and scipy into python during his talk, not cython. For the reason you gave, including numpy into python is not really on the radar. It was tried unsuccessfully some time ago, and the PEP buffer (3118 IIRC) is a much more low-level API to share "typed" buffer at the C level. Hopefully, numpy will be built on top of this at some point. Scipy is very unlikely IMHO - I doubt depending on fortran code would be acceptable for python. David From mathieu at mblondel.org Wed Oct 21 05:10:09 2009 From: mathieu at mblondel.org (Mathieu Blondel) Date: Wed, 21 Oct 2009 18:10:09 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <4ADEC51E.7080309@ar.media.kyoto-u.ac.jp> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp> <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com> <4ADEC51E.7080309@ar.media.kyoto-u.ac.jp> Message-ID: <7e1472660910210210l57fe086eqae02c7d9aad056e4@mail.gmail.com> On Wed, Oct 21, 2009 at 5:23 PM, David Cournapeau wrote: > Ah, I think you are mistaken, then - he referred to merging numpy and > scipy into python during his talk, not cython. Oh, I meant to say CPython (the default implementation of Python), not Cython. I didn't realize that they were different projects. So the method dispatch seems to be a great obstacle to an object-oriented SIMD API. That would seem more feasible in C++ with non-virtual methods. Java has final methods, which can be useful information to the JIT. C# seems to have "sealed" methods. Interestingly, the Mono.SIMD API uses static methods, which I guess is to avoid the dispatch problem. But it makes the code look uglier. For example, instead of a + b, you have to do Vector4f.Addition(a, b). Mathieu From faltet at pytables.org Wed Oct 21 05:12:21 2009 From: faltet at pytables.org (Francesc Alted) Date: Wed, 21 Oct 2009 11:12:21 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> Message-ID: <200910211112.22037.faltet@pytables.org> A Wednesday 21 October 2009 07:44:39 Mathieu Blondel escrigu?: > Hello, > > About one year ago, a high-level, objected-oriented SIMD API was added > to Mono. For example, there is a class Vector4f for vectors of 4 > floats and this class implements methods such as basic operators, > bitwise operators, comparison operators, min, max, sqrt, shuffle > directly using SIMD operations. [clip] It is important to stress out that all the above operations, except probably sqrt, are all memory-bound operations, and that implementing them for numpy would not represent a significant improvement at all. This is because numpy is a package that works mainly with arrays in an element-wise way, and in this scenario, the time to transmit data to CPU dominates, by and large, over the time to perform operations. Among other places, you can find a detailed explication of this fact in my presentation at latest EuroSciPy: http://www.pytables.org/docs/StarvingCPUs.pdf Cheers, -- Francesc Alted From robince at gmail.com Wed Oct 21 05:46:54 2009 From: robince at gmail.com (Robin) Date: Wed, 21 Oct 2009 10:46:54 +0100 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard Message-ID: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> Hi, I was wondering what the recommended way to run numpy/scipy on mac os x 10.6 is. I understood previously it was recommended to use python.org python and keep everything seperate from the system python, which worked well. But now I would like to have a 64 bit python and numpy, and there isn't one available from python.org. Also I think the python.org ones are built against the 10.4 SDK which I understand requires using gcc 4.0 - I was keen to try 4.2 to see if some of the differential performance I've seen between c and fortran goes away. So I guess the choices are system python with a virtualenv or a macports built python (or a hand built python). I'm thinking of using macports at the moment but I'm not sure how to handle preventing macports numpy from installing so I can use svn numpy. I'm not sure how the virtualenv will work with packaged installers - (ie how could I tell the wx installer to install into the virtualenv). I was wondering what others do? Cheers Robin From david at ar.media.kyoto-u.ac.jp Wed Oct 21 05:28:09 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 21 Oct 2009 18:28:09 +0900 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> Message-ID: <4ADED429.4030809@ar.media.kyoto-u.ac.jp> Robin wrote: > Hi, > > I was wondering what the recommended way to run numpy/scipy on mac os > x 10.6 is. I understood previously it was recommended to use > python.org python and keep everything seperate from the system python, > which worked well. You can simply use the --user option to the install command: instead of installing in /System, it will install numpy (or any other package) in $HOME/.local, and you don't need to update PYTHONPATH, as python knows about this location. This is a new feature in 2.6, and can be a simple alternative to virtualenv if you don't need the other features (sandboxing, etc...). I think you need a very recent version of virtualenv on slow leopard, if you decide to go this route, David From robince at gmail.com Wed Oct 21 06:58:30 2009 From: robince at gmail.com (Robin) Date: Wed, 21 Oct 2009 11:58:30 +0100 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: <4ADED429.4030809@ar.media.kyoto-u.ac.jp> References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> Message-ID: <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> On Wed, Oct 21, 2009 at 10:28 AM, David Cournapeau wrote: > Robin wrote: >> Hi, >> >> I was wondering what the recommended way to run numpy/scipy on mac os >> x 10.6 is. I understood previously it was recommended to use >> python.org python and keep everything seperate from the system python, >> which worked well. > > You can simply use the --user option to the install command: instead of > installing in /System, it will install numpy (or any other package) in > $HOME/.local, and you don't need to update PYTHONPATH, as python knows > about this location. Thanks - that looks ideal. I take it $HOME/.local is searched first so numpy will be used fromt here in preference to the system numpy. My only worry is with installer packages - I'm thinking mainly of wxpython. Is there a way I can get that package to install in $HOME/.local. (The installer only seems to let you choose a drive). Also - if I build for example vim against the system python, will I be able to see packages in $HOME/.local from the python interpreter inside vim? Cheers Robin From david at ar.media.kyoto-u.ac.jp Wed Oct 21 06:41:18 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 21 Oct 2009 19:41:18 +0900 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> Message-ID: <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> Robin wrote: > > Thanks - that looks ideal. I take it $HOME/.local is searched first so > numpy will be used fromt here in preference to the system numpy. > Yes, unless framework-enabled python does something 'fishy' (I think framework vs convention python have different rules w.r.t. sys.path). As always, in doubt, you should check with numpy.__file__ which one is loaded. > My only worry is with installer packages - I'm thinking mainly of > wxpython. Is there a way I can get that package to install in > $HOME/.local. (The installer only seems to let you choose a drive). > is wxpython supported on python 64 bits ? I don't know if you can install a .mpkg in $HOME/.local. It is not supported by python AFAIK, and I think you would have to hack something to make it work. May just be easier to build by yourself. > Also - if I build for example vim against the system python, will I be > able to see packages in $HOME/.local from the python interpreter > inside vim? I don't know about vim-python interaction: doesn't vim uses its own python embedded within vim process ? You would have to check sys.path and similar variables, as well as vim doc. David From robince at gmail.com Wed Oct 21 07:24:12 2009 From: robince at gmail.com (Robin) Date: Wed, 21 Oct 2009 12:24:12 +0100 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> Message-ID: <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> Thanks... On Wed, Oct 21, 2009 at 11:41 AM, David Cournapeau wrote: > Robin wrote: >> >> Thanks - that looks ideal. I take it $HOME/.local is searched first so >> numpy will be used fromt here in preference to the system numpy. >> > > Yes, unless framework-enabled python does something 'fishy' (I think > framework vs convention python have different rules w.r.t. sys.path). As > always, in doubt, you should check with numpy.__file__ which one is loaded. It seems it does... the built in numpy which is in '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python', comes before $HOME/.local in sys.path so I think system numpy will always be picked up over my own installed version. robin-mbp:~ robince$ /usr/bin/python2.6 -c "import sys; print sys.path" ['', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python26.zip', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-darwin', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac/lib-scriptpackages', '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-tk', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-old', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload', '/Users/robince/.local/lib/python2.6/site-packages', '/Library/Python/2.6/site-packages', '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/PyObjC', '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/wx-2.8-mac-unicode'] So I guess virtualenv or macports? Cheers Robin From cournape at gmail.com Wed Oct 21 08:27:46 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 21 Oct 2009 21:27:46 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <200910211112.22037.faltet@pytables.org> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> Message-ID: <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> On Wed, Oct 21, 2009 at 6:12 PM, Francesc Alted wrote: > A Wednesday 21 October 2009 07:44:39 Mathieu Blondel escrigu?: >> Hello, >> >> About one year ago, a high-level, objected-oriented SIMD API was added >> to Mono. For example, there is a class Vector4f for vectors of 4 >> floats and this class implements methods such as basic operators, >> bitwise operators, comparison operators, min, max, sqrt, shuffle >> directly using SIMD operations. > [clip] > > It is important to stress out that all the above operations, except probably > sqrt, are all memory-bound operations, and that implementing them for numpy > would not represent a significant improvement at all. > This is because numpy is a package that works mainly with arrays in an > element-wise way, and in this scenario, the time to transmit data to CPU > dominates, by and large, over the time to perform operations. Is it general, or just for simple operations in numpy and ufunc ? I remember that for music softwares, SIMD used to matter a lot, even for simple bus mixing (which is basically a ax+by with a, b scalars and x y the input arrays). Do you have any interest in adding SIMD to some core numpy (transcendental functions). If so, I would try to go back to the problem of runtime SSE detection and loading of optimized shared library in a cross-platform way - that's something which should be done at some point in numpy, and people requiring it would be a good incentive. David From matthieu.brucher at gmail.com Wed Oct 21 08:37:02 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Wed, 21 Oct 2009 14:37:02 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> Message-ID: > Is it general, or just for simple operations in numpy and ufunc ? I > remember that for music softwares, SIMD used to matter a lot, even for > simple bus mixing (which is basically a ax+by with a, b scalars and x > y the input arrays). Indeed, it shouldn't :| I think the main reason might not be SIMD, but the additional hypothesis you put on the arrays (aliasing). This way, todays compilers may not even need the actual SIMD instructions. I have the same opinion as Francesc, it would only be useful for operations that need more computations that load/store. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From faltet at pytables.org Wed Oct 21 08:47:02 2009 From: faltet at pytables.org (Francesc Alted) Date: Wed, 21 Oct 2009 14:47:02 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> Message-ID: <200910211447.02842.faltet@pytables.org> A Wednesday 21 October 2009 14:27:46 David Cournapeau escrigu?: > > This is because numpy is a package that works mainly with arrays in an > > element-wise way, and in this scenario, the time to transmit data to CPU > > dominates, by and large, over the time to perform operations. > > Is it general, or just for simple operations in numpy and ufunc ? I > remember that for music softwares, SIMD used to matter a lot, even for > simple bus mixing (which is basically a ax+by with a, b scalars and x > y the input arrays). This is general, as long as the dataset has to be brought from memory to CPU, and operations to be done are element-wise and simple (i.e. not transcendental). SIMD does matter in general when the dataset: 1) is already in cache 2) you have to perform costly operations (mainly transcendental) 3) a combination of the above I don't know the case for music software, but if you say that ax+by are accelerated by SIMD, I'd say that case 1) is happening. > Do you have any interest in adding SIMD to some core numpy > (transcendental functions). If so, I would try to go back to the > problem of runtime SSE detection and loading of optimized shared > library in a cross-platform way - that's something which should be > done at some point in numpy, and people requiring it would be a good > incentive. I don't personally have a lot of interest implementing this for numpy. But in case anyone does, I find the next library: http://gruntthepeon.free.fr/ssemath/ very interesting. Perhaps there could be other (free) implementations... -- Francesc Alted From pav+sp at iki.fi Wed Oct 21 09:14:54 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Wed, 21 Oct 2009 13:14:54 +0000 (UTC) Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> Message-ID: Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote: [clip] >> Do you have any interest in adding SIMD to some core numpy >> (transcendental functions). If so, I would try to go back to the >> problem of runtime SSE detection and loading of optimized shared >> library in a cross-platform way - that's something which should be done >> at some point in numpy, and people requiring it would be a good >> incentive. > > I don't personally have a lot of interest implementing this for numpy. > But in case anyone does, I find the next library: > > http://gruntthepeon.free.fr/ssemath/ > > very interesting. Perhaps there could be other (free) > implementations... Optimized transcendental functions could be interesting. For example for tanh, call overhead is overcome already for ~30-element arrays. Since these are ufuncs, I suppose the SSE implementations could just be put in a separate module, which is always compiled. Before importing the module, we could simply check from Python side that the CPU supports the necessary instructions. If everything is OK, the accelerated implementations would then just replace the Numpy routines. This type of project could probably also be started outside Numpy, and just monkey-patch the Numpy routines on import. -- Pauli Virtanen From rmay31 at gmail.com Wed Oct 21 09:54:04 2009 From: rmay31 at gmail.com (Ryan May) Date: Wed, 21 Oct 2009 08:54:04 -0500 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> Message-ID: > It seems it does... ?the built in numpy which is in > '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python', > comes before $HOME/.local in sys.path so I think system numpy will > always be picked up over my own installed version. > > robin-mbp:~ robince$ /usr/bin/python2.6 -c "import sys; print sys.path" > ['', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python26.zip', > '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6', > '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-darwin', > '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac', > '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac/lib-scriptpackages', > '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python', > '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-tk', > '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-old', > '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload', > '/Users/robince/.local/lib/python2.6/site-packages', > '/Library/Python/2.6/site-packages', > '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/PyObjC', > '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/wx-2.8-mac-unicode'] > > So I guess virtualenv or macports? Wow. Once again, Apple makes using python unnecessarily difficult. Someone needs a whack with a clue bat. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From robince at gmail.com Wed Oct 21 10:00:12 2009 From: robince at gmail.com (Robin) Date: Wed, 21 Oct 2009 15:00:12 +0100 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> Message-ID: <2d5132a50910210700o7558e061r8e596aeb7beb612@mail.gmail.com> On Wed, Oct 21, 2009 at 2:54 PM, Ryan May wrote: > Wow. ?Once again, Apple makes using python unnecessarily difficult. > Someone needs a whack with a clue bat. Right - I think in the end I decided I will try and use macports python with virtualenv for svn numpy/scipy and leave system python well alone as before. Cheers Robin From zachary.pincus at yale.edu Wed Oct 21 10:09:35 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 21 Oct 2009 10:09:35 -0400 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> Message-ID: <716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu> > Wow. Once again, Apple makes using python unnecessarily difficult. > Someone needs a whack with a clue bat. Well, some tools from the operating system use numpy and other python modules. And upgrading one of these modules might conceivably break that dependency, leading to breakage in the OS. So Apple's design goal is to keep their tools working right, and absent any clear standard for package management in python that allows for side-by-side installation of different versions of the same module, this is probably the best way to go from their perspective. Personally, I just install a hand-built python in /Frameworks. This is very easy, and it is also where the python.org python goes, so 3rd- party installers with hard-coded paths (boo) still work. Zach From mdroe at stsci.edu Wed Oct 21 10:29:55 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Wed, 21 Oct 2009 10:29:55 -0400 Subject: [Numpy-discussion] Using numpydoc outside of numpy Message-ID: <4ADF1AE3.9030002@stsci.edu> I'm in the process of converting a project to use Sphinx for documentation, and would like to use the Numpy docstring standard with its sections etc. It appears, however, that the numpydoc sphinxext is not installed but merely sits in doc/sphinxext. I see that scipy uses an SVN external to get at this stuff, but I'd prefer not to do that if possible (my project may not live in SVN forever). I see that numpydoc is separately installable (it has a setup.py), but then I assume users of my project who wish to build the docs must download the numpy source and know to explicitly install the numpydoc package. Are there plans to install the numpydoc extension under the numpy install tree somewhere so that other projects can use it, simply by having numpy installed? This is what we do with the plot_directive in matplotlib. Is there a reason that's not a good idea for numpy? Or is there a way for my project to use it that I'm missing? Cheers, Mike -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From rmay31 at gmail.com Wed Oct 21 11:01:52 2009 From: rmay31 at gmail.com (Ryan May) Date: Wed, 21 Oct 2009 10:01:52 -0500 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: <716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu> References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> <716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu> Message-ID: On Wed, Oct 21, 2009 at 9:09 AM, Zachary Pincus wrote: >> Wow. ?Once again, Apple makes using python unnecessarily difficult. >> Someone needs a whack with a clue bat. > > Well, some tools from the operating system use numpy and other python > modules. And upgrading one of these modules might conceivably break > that dependency, leading to breakage in the OS. So Apple's design goal > is to keep their tools working right, and absent any clear standard > for package management in python that allows for side-by-side > installation of different versions of the same module, this is > probably the best way to go from their perspective. > > Personally, I just install a hand-built python in /Frameworks. This is > very easy, and it is also where the python.org python goes, so 3rd- > party installers with hard-coded paths (boo) still work. ~/.local was added to *be the standard* for easily installing python packages in your user account. And it works perfectly on the other major OSes, no twiddling of paths anymore. I understand the desire to not conflict with the system's python intall (and, in fact, applaud them for using python). Indeed, on linux, I do end up conflicting between my system numpy and my SVN install in ~/.local, and I don't have a problem with it. This comes with the territory when I start doing power-user/developer tasks. It just to me seems odd that the OS that works so hard to make so many things easier makes this *more* difficult. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Sent from Norman, Oklahoma, United States From mdroe at stsci.edu Wed Oct 21 11:13:35 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Wed, 21 Oct 2009 11:13:35 -0400 Subject: [Numpy-discussion] Using numpydoc outside of numpy In-Reply-To: <4ADF1AE3.9030002@stsci.edu> References: <4ADF1AE3.9030002@stsci.edu> Message-ID: <4ADF251F.3070505@stsci.edu> Sorry for the noise. Found the instructions in HOWTO_BUILD_DOCS.txt . Mike Michael Droettboom wrote: > I'm in the process of converting a project to use Sphinx for > documentation, and would like to use the Numpy docstring standard with > its sections etc. It appears, however, that the numpydoc sphinxext is > not installed but merely sits in doc/sphinxext. I see that scipy uses > an SVN external to get at this stuff, but I'd prefer not to do that if > possible (my project may not live in SVN forever). I see that numpydoc > is separately installable (it has a setup.py), but then I assume users > of my project who wish to build the docs must download the numpy source > and know to explicitly install the numpydoc package. > > Are there plans to install the numpydoc extension under the numpy > install tree somewhere so that other projects can use it, simply by > having numpy installed? This is what we do with the plot_directive in > matplotlib. Is there a reason that's not a good idea for numpy? Or is > there a way for my project to use it that I'm missing? > > Cheers, > Mike > > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From josef.pktd at gmail.com Wed Oct 21 11:18:05 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 21 Oct 2009 11:18:05 -0400 Subject: [Numpy-discussion] TypeError when calling numpy.kaiser() In-Reply-To: References: Message-ID: <1cd32cbb0910210818g2ee16efma39ef4ab5cca32f9@mail.gmail.com> On Sun, Oct 18, 2009 at 3:11 PM, Jeffrey McGee wrote: > Howdy, > I'm having trouble getting the kaiser window to work. Anytime I try > to call numpy.kaiser(), it throws an exception. Here's the output when > I run the example code from > http://docs.scipy.org/doc/numpy/reference/generated/numpy.kaiser.html : > > > Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) > [GCC 4.3.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>>> from numpy import kaiser >>>> kaiser(12, 14) > > Traceback (most recent call last): > File "", line 1, in > > File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py", > line 2630, in kaiser > return i0(beta * sqrt(1-((n-alpha)/alpha)**2.0))/i0(beta) > File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py", > > line 2507, in i0 > y[ind] = _i0_1(x[ind]) > TypeError: array cannot be safely cast to required type >>>> > > > Is this a bug? Am I doing something wrong? (I'm using the Ubuntu 9.4 > > packages for python and numpy.) > Thanks, > Jeff > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > It works with my numpy 1.3.0, but np.i0 doesn't like integers. Can you try with a float 14. instead of the integer? np.kaiser(12, 14.) You could file a ticket, and we see if the experts consider this a feature or a bug. (I don't know anything about kaiser or i0) Josef >>> np.kaiser(12,14) array([ 7.72686684e-06, 3.46009194e-03, 4.65200189e-02, 2.29737120e-01, 5.99885316e-01, 9.45674898e-01, 9.45674898e-01, 5.99885316e-01, 2.29737120e-01, 4.65200189e-02, 3.46009194e-03, 7.72686684e-06]) >>> np.i0(1) Traceback (most recent call last): File "", line 1, in np.i0(1) File "C:\Programs\Python25\Lib\site-packages\numpy\lib\function_base.py", line 2484, in i0 y[ind] = _i0_1(x[ind]) TypeError: array cannot be safely cast to required type >>> np.i0(1.) array(1.2660658777520082) From renesd at gmail.com Wed Oct 21 12:19:02 2009 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Wed, 21 Oct 2009 17:19:02 +0100 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> Message-ID: <64ddb72c0910210919m4e449aabkcfa5db1ecee32b18@mail.gmail.com> On Wed, Oct 21, 2009 at 2:14 PM, Pauli Virtanen > wrote: > Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote: > [clip] > >> Do you have any interest in adding SIMD to some core numpy > >> (transcendental functions). If so, I would try to go back to the > >> problem of runtime SSE detection and loading of optimized shared > >> library in a cross-platform way - that's something which should be done > >> at some point in numpy, and people requiring it would be a good > >> incentive. > > > > I don't personally have a lot of interest implementing this for numpy. > > But in case anyone does, I find the next library: > > > > http://gruntthepeon.free.fr/ssemath/ > > > > very interesting. Perhaps there could be other (free) > > implementations... > > Optimized transcendental functions could be interesting. For example for > tanh, call overhead is overcome already for ~30-element arrays. > > Since these are ufuncs, I suppose the SSE implementations could just be > put in a separate module, which is always compiled. Before importing the > module, we could simply check from Python side that the CPU supports the > necessary instructions. If everything is OK, the accelerated > implementations would then just replace the Numpy routines. > > This type of project could probably also be started outside Numpy, and > just monkey-patch the Numpy routines on import. > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Anyone seen the corepy numpy gsoc project? http://numcorepy.blogspot.com/ It implements a number of functions with the corepy runtime assembler. The project showed nice simd speedups for numpy. I've been following the liborc project... which is a runtime assembler that uses a generic assembly language and supports many different simd assembly languages (eg SSE, MMX, ARM, Altivec). It's the replacement for the liboil library (used in gstreamer etc). http://code.entropywave.com/projects/orc/ cu! -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Oct 21 12:28:02 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 21 Oct 2009 10:28:02 -0600 Subject: [Numpy-discussion] TypeError when calling numpy.kaiser() In-Reply-To: <1cd32cbb0910210818g2ee16efma39ef4ab5cca32f9@mail.gmail.com> References: <1cd32cbb0910210818g2ee16efma39ef4ab5cca32f9@mail.gmail.com> Message-ID: On Wed, Oct 21, 2009 at 9:18 AM, wrote: > On Sun, Oct 18, 2009 at 3:11 PM, Jeffrey McGee > wrote: > > Howdy, > > I'm having trouble getting the kaiser window to work. Anytime I try > > to call numpy.kaiser(), it throws an exception. Here's the output when > > I run the example code from > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.kaiser.html : > > > > > > Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) > > [GCC 4.3.3] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > > >>>> from numpy import kaiser > >>>> kaiser(12, 14) > > > > Traceback (most recent call last): > > File "", line 1, in > > > > File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py", > > line 2630, in kaiser > > return i0(beta * sqrt(1-((n-alpha)/alpha)**2.0))/i0(beta) > > File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py", > > > > line 2507, in i0 > > y[ind] = _i0_1(x[ind]) > > TypeError: array cannot be safely cast to required type > >>>> > > > > > > Is this a bug? Am I doing something wrong? (I'm using the Ubuntu 9.4 > > > > packages for python and numpy.) > > Thanks, > > Jeff > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > It works with my numpy 1.3.0, but np.i0 doesn't like integers. > Can you try with a float 14. instead of the integer? > np.kaiser(12, 14.) > > Hmm, I think np.i0 (the modified Bessel function of order zero), should accept integer inputs, I'm not sure why it doesn't. As an aside, would it be appropriate to have some of the more common Bessel functions as ufuncs? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregor.thalhammer at gmail.com Wed Oct 21 12:38:10 2009 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Wed, 21 Oct 2009 18:38:10 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> Message-ID: <42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com> Pauli Virtanen schrieb: > Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote: > [clip] > >>> Do you have any interest in adding SIMD to some core numpy >>> (transcendental functions). If so, I would try to go back to the >>> problem of runtime SSE detection and loading of optimized shared >>> library in a cross-platform way - that's something which should be done >>> at some point in numpy, and people requiring it would be a good >>> incentive. >>> >> I don't personally have a lot of interest implementing this for numpy. >> But in case anyone does, I find the next library: >> >> http://gruntthepeon.free.fr/ssemath/ >> >> very interesting. Perhaps there could be other (free) >> implementations... >> > > Optimized transcendental functions could be interesting. For example for > tanh, call overhead is overcome already for ~30-element arrays. > > Since these are ufuncs, I suppose the SSE implementations could just be > put in a separate module, which is always compiled. Before importing the > module, we could simply check from Python side that the CPU supports the > necessary instructions. If everything is OK, the accelerated > implementations would then just replace the Numpy routines. > I once wrote a module that replaces the built in transcendental functions of numpy by optimized versions from Intels vector math library. If someone is interested, I can publish it. In my experience it was of little use since real world problems are limited by memory bandwidth. Therefore extending numexpr with optimized transcendental functions was the better solution. Afterwards I discovered that I could have saved the effort of the first approach since gcc is able to use optimized functions from Intels vector math library or AMD's math core library, see the doc's of -mveclibabi. You just need to recompile numpy with proper compiler arguments. Gregor > This type of project could probably also be started outside Numpy, and > just monkey-patch the Numpy routines on import. > > From robert.kern at gmail.com Wed Oct 21 12:39:00 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 21 Oct 2009 11:39:00 -0500 Subject: [Numpy-discussion] TypeError when calling numpy.kaiser() In-Reply-To: References: <1cd32cbb0910210818g2ee16efma39ef4ab5cca32f9@mail.gmail.com> Message-ID: <3d375d730910210939q73174d22m523d0d73b051476c@mail.gmail.com> On Wed, Oct 21, 2009 at 11:28, Charles R Harris wrote: > As an aside, would it be > appropriate to have some of the more common Bessel functions as ufuncs? I'd prefer that we stick to the policy of including special functions that are part of the C99 standard (or another appropriate one) and no more. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From rmay31 at gmail.com Wed Oct 21 14:23:14 2009 From: rmay31 at gmail.com (Ryan May) Date: Wed, 21 Oct 2009 13:23:14 -0500 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> <42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com> Message-ID: On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer wrote: > I once wrote a module that replaces the built in transcendental > functions of numpy by optimized versions from Intels vector math > library. If someone is interested, I can publish it. In my experience it > was of little use since real world problems are limited by memory > bandwidth. Therefore extending numexpr with optimized transcendental > functions was the better solution. Afterwards I discovered that I could > have saved the effort of the first approach since gcc is able to use > optimized functions from Intels vector math library or AMD's math core > library, see the doc's of -mveclibabi. You just need to recompile numpy > with proper compiler arguments. Do you have a link to the documentation for -mveclibabi? I can't find this anywhere and I'm *very* interested. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From rmay31 at gmail.com Wed Oct 21 14:31:13 2009 From: rmay31 at gmail.com (Ryan May) Date: Wed, 21 Oct 2009 13:31:13 -0500 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> <42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com> Message-ID: On Wed, Oct 21, 2009 at 1:23 PM, Ryan May wrote: > On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer > wrote: >> I once wrote a module that replaces the built in transcendental >> functions of numpy by optimized versions from Intels vector math >> library. If someone is interested, I can publish it. In my experience it >> was of little use since real world problems are limited by memory >> bandwidth. Therefore extending numexpr with optimized transcendental >> functions was the better solution. Afterwards I discovered that I could >> have saved the effort of the first approach since gcc is able to use >> optimized functions from Intels vector math library or AMD's math core >> library, see the doc's of -mveclibabi. You just need to recompile numpy >> with proper compiler arguments. > > Do you have a link to the documentation for -mveclibabi? ?I can't find > this anywhere and I'm *very* interested. Ah, there it is. Google doesn't come up with much, but the PDF manual does have it: http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc.pdf (It helps when you don't mis-type your search in the PDF). Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Sent from Norman, Oklahoma, United States From ndbecker2 at gmail.com Wed Oct 21 14:46:45 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 21 Oct 2009 14:46:45 -0400 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> <42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com> Message-ID: ... > I once wrote a module that replaces the built in transcendental > functions of numpy by optimized versions from Intels vector math > library. If someone is interested, I can publish it. In my experience it > was of little use since real world problems are limited by memory > bandwidth. Therefore extending numexpr with optimized transcendental > functions was the better solution. Afterwards I discovered that I could > have saved the effort of the first approach since gcc is able to use > optimized functions from Intels vector math library or AMD's math core > library, see the doc's of -mveclibabi. You just need to recompile numpy > with proper compiler arguments. > I'm interested. I'd like to try AMD rather than intel, because AMD is easier to obtain. I'm running on intel machine, I hope that doesn't matter too much. What exactly do I need to do? I see that numpy/site.cfg has an MKL section. I'm assuming I should not touch that, but just mess with gcc flags? From charlesr.harris at gmail.com Wed Oct 21 15:02:59 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 21 Oct 2009 13:02:59 -0600 Subject: [Numpy-discussion] GSOC 2010 Message-ID: Hi All, I don't feel that numpy/scipy did as well in GSOC 2009 as it could have. I think this was mostly due to lack of preparation on our part, we weren't ready when the students started showing up on the lists. So I would like to put together a selection of suitable projects and corresponding mentors that we could put on the wiki somewhere and advertise. Just to start things off, here are two things that come to mind. - Python 3k transition. I think it is time to start looking at this seriously. - Best of breed special functions in cython. These could be part of a separate numpy extras package where code is restricted to C, Cython, and Python. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Oct 21 15:11:55 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 21 Oct 2009 15:11:55 -0400 Subject: [Numpy-discussion] GSOC 2010 In-Reply-To: References: Message-ID: <1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com> On Wed, Oct 21, 2009 at 3:02 PM, Charles R Harris wrote: > Hi All, > > I don't feel that numpy/scipy did as well in GSOC 2009 as it could have.? I > think this was mostly due to lack of preparation on our part, we weren't > ready when the students started showing up on the lists. So I would like to > put together a selection of suitable projects and corresponding mentors that > we could put on the wiki somewhere and advertise. Just to start things off, > here are two things that come to mind. > > Python 3k transition. I think it is time to start looking at this seriously. > Best of breed special functions in cython. These could be part of a separate > numpy extras package where code is restricted to C, Cython, and Python. > > Thoughts? for scipy: more stats, gsoc2009 went very well. Josef > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Wed Oct 21 15:23:07 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 21 Oct 2009 13:23:07 -0600 Subject: [Numpy-discussion] GSOC 2010 In-Reply-To: <1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com> References: <1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com> Message-ID: On Wed, Oct 21, 2009 at 1:11 PM, wrote: > On Wed, Oct 21, 2009 at 3:02 PM, Charles R Harris > wrote: > > Hi All, > > > > I don't feel that numpy/scipy did as well in GSOC 2009 as it could have. > I > > think this was mostly due to lack of preparation on our part, we weren't > > ready when the students started showing up on the lists. So I would like > to > > put together a selection of suitable projects and corresponding mentors > that > > we could put on the wiki somewhere and advertise. Just to start things > off, > > here are two things that come to mind. > > > > Python 3k transition. I think it is time to start looking at this > seriously. > > Best of breed special functions in cython. These could be part of a > separate > > numpy extras package where code is restricted to C, Cython, and Python. > > > > Thoughts? > > for scipy: more stats, gsoc2009 went very well. > > Yes, it seems so. I had the impression that planning for that project was undertaken pretty early on with the involvement of Skipper. What exactly *was* the history of that project and what can we learn from it? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwf at cs.toronto.edu Wed Oct 21 16:04:26 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Wed, 21 Oct 2009 16:04:26 -0400 Subject: [Numpy-discussion] GSOC 2010 In-Reply-To: References: Message-ID: <799092A0-98BD-4491-9F93-01F63397C47A@cs.toronto.edu> On 21-Oct-09, at 3:02 PM, Charles R Harris wrote: > ? Best of breed special functions in cython. These could be part of > a separate numpy extras package where code is restricted to C, > Cython, and Python. I think a lot of SciPy could be usefully brought over to Cython, as well (not all the C code, but some of it). Having Cython do the wrapping should reduce the burden in the eventual Py3k transition. David From dwf at cs.toronto.edu Wed Oct 21 16:12:31 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Wed, 21 Oct 2009 16:12:31 -0400 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> Message-ID: On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote: > Since these are ufuncs, I suppose the SSE implementations could just > be > put in a separate module, which is always compiled. Before importing > the > module, we could simply check from Python side that the CPU supports > the > necessary instructions. If everything is OK, the accelerated > implementations would then just replace the Numpy routines. Am I mistaken or wasn't that sort of the goal of Andrew Friedley's CorePy work this summer? Looking at his slides again, the speedups are rather impressive. I wonder if these could be usefully integrated into numpy itself? David From jsseabold at gmail.com Wed Oct 21 16:13:14 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 21 Oct 2009 16:13:14 -0400 Subject: [Numpy-discussion] GSOC 2010 In-Reply-To: References: <1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com> Message-ID: On Wed, Oct 21, 2009 at 3:23 PM, Charles R Harris wrote: > > > On Wed, Oct 21, 2009 at 1:11 PM, wrote: >> >> On Wed, Oct 21, 2009 at 3:02 PM, Charles R Harris >> wrote: >> > Hi All, >> > >> > I don't feel that numpy/scipy did as well in GSOC 2009 as it could >> > have.? I >> > think this was mostly due to lack of preparation on our part, we weren't >> > ready when the students started showing up on the lists. So I would like >> > to >> > put together a selection of suitable projects and corresponding mentors >> > that >> > we could put on the wiki somewhere and advertise. Just to start things >> > off, >> > here are two things that come to mind. >> > >> > Python 3k transition. I think it is time to start looking at this >> > seriously. >> > Best of breed special functions in cython. These could be part of a >> > separate >> > numpy extras package where code is restricted to C, Cython, and Python. >> > >> > Thoughts? >> >> for scipy: more stats, gsoc2009 went very well. >> > > Yes, it seems so. I had the impression that planning for that project was > undertaken pretty early on with the involvement of Skipper. What exactly > *was* the history of that project and what can we learn from it? > Short(-ish) version of some general thoughts from my end: GSoC was brought to my attention as a fruitful endeavor (and it definitely was!). There was a list of potential topics posted on SciPy SoC mentoring page, and I just kind of went through all of them to see where the most value-add would be (both ways from me to the SciPy project and from the SciPy project to my studies/work). So that list of topics was the main driving force, and I'm glad we're starting to push for ideas now (I have a few ideas of my own motivated mostly by needs of stats/statistical modeling, but I need some more time to think). However, we obviously should be open to new ideas from students coming to the project. Another thing is the importance of the application process. The thing that pushed me was reading about other successful applicants for SoC in general (there is a lot of really good advice and write-ups out there). It is a very competitive program, so your proposal needs to be very, very well thought out. That includes drafts of proposals with feedback from the community and mentors well before the official application process even starts, so the earlier that's taken care of, the better. Beyond that, students should know what's expected of them coming into the program (what development tools they need to be familiar with, numpy/scipy standards, familiarization with the code base), and what's expected of the end product (high quality code, test driven development, etc.). I also can't stress enough how helpful it was to have Alan and Josef as mentors, as well as the availability to use the MLs for more general questions. Obviously, the level of engagement of the mentor is going to depend on the project and the student, but I for one couldn't have learned as much as I did nor gotten as far as we did without their help. If these comments are seen as helpful, I can try to work on some more detailed ones/links to detailed ones, as I think this would be beneficial to establish as something to look forward to. The availability of this program (Thank you, Google) allows significant strides in development to be made each summer and that should not be overlooked (I don't think it is). Cheers, Skipper From josef.pktd at gmail.com Wed Oct 21 16:20:51 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 21 Oct 2009 16:20:51 -0400 Subject: [Numpy-discussion] GSOC 2010 In-Reply-To: References: <1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com> Message-ID: <1cd32cbb0910211320h6e7c93bcwfa1dbee9f49dab60@mail.gmail.com> On Wed, Oct 21, 2009 at 3:23 PM, Charles R Harris wrote: > > > On Wed, Oct 21, 2009 at 1:11 PM, wrote: >> >> On Wed, Oct 21, 2009 at 3:02 PM, Charles R Harris >> wrote: >> > Hi All, >> > >> > I don't feel that numpy/scipy did as well in GSOC 2009 as it could >> > have.? I >> > think this was mostly due to lack of preparation on our part, we weren't >> > ready when the students started showing up on the lists. So I would like >> > to >> > put together a selection of suitable projects and corresponding mentors >> > that >> > we could put on the wiki somewhere and advertise. Just to start things >> > off, >> > here are two things that come to mind. >> > >> > Python 3k transition. I think it is time to start looking at this >> > seriously. >> > Best of breed special functions in cython. These could be part of a >> > separate >> > numpy extras package where code is restricted to C, Cython, and Python. >> > >> > Thoughts? >> >> for scipy: more stats, gsoc2009 went very well. >> > > Yes, it seems so. I had the impression that planning for that project was > undertaken pretty early on with the involvement of Skipper. What exactly > *was* the history of that project and what can we learn from it? Skipper started early in the preparation, and with the help of Allan and me had a pretty concrete proposal. Because of final exams, the actual work on statsmodels started a bit late. >From my perspective a few issues that helped: Skipper, Alan and I have the same background (in econometrics), so I knew roughly what knowledge I could expect. Skipper was willing and able to work his way through several textbooks for the models that he, and I, didn't know much (or anything) about. "Cleaning up stats.models" was a relatively well defined project, with relatively easy to define goals. I kept reminding him about writing tests, and to verify results with other packages, so that we knew when we had a model "correctly" cleaned up. Skipper spend a lot of time on this. For most parts, I worked on the code in parallel with him, checking on his progress, looking at the problems we had with matching the results of the other statistical packages, finding bugs and writing some draft code. During July, August we had almost daily long email threads. I think, this helped a lot, so that Skipper didn't get stuck or sidetracked, and that I was able to keep up with the changes (and learn some of the statistical background). Josef > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From afriedle at indiana.edu Wed Oct 21 16:36:38 2009 From: afriedle at indiana.edu (Andrew Friedley) Date: Wed, 21 Oct 2009 16:36:38 -0400 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> Message-ID: <4ADF70D6.1030001@indiana.edu> sigh; yet another email dropped by the list. David Warde-Farley wrote: > On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote: > >> Since these are ufuncs, I suppose the SSE implementations could just >> be >> put in a separate module, which is always compiled. Before importing >> the >> module, we could simply check from Python side that the CPU supports >> the >> necessary instructions. If everything is OK, the accelerated >> implementations would then just replace the Numpy routines. > > Am I mistaken or wasn't that sort of the goal of Andrew Friedley's > CorePy work this summer? > > Looking at his slides again, the speedups are rather impressive. I > wonder if these could be usefully integrated into numpy itself? Yes, my GSoC project is closely related, though I didn't do the CPU detection part, that'd be easy to do. Also I wrote my code specifically for 64-bit x86. I didn't focus so much on the transcendental functions, though they wouldn't be too hard to implement. There's also the possibility to provide implementations with differing tradeoffs between accuracy and performance. I think the blog link got posted already, but here's relevant info: http://numcorepy.blogspot.com http://www.corepy.org/wiki/index.php?title=CoreFunc I talked about this in my SciPy talk and up-coming paper, as well. Also people have just been talking about x86 in this thread -- other architectures could be supported too; eg PPC/Altivec or even Cell SPU and other accelerators. I actually wrote a quick/dirty implementation of addition and vector normalization ufuncs for Cell SPU recently. Basic result is that overall performance is very roughly comparable to a similar speed x86 chip, but this is a huge win over just running on the extremely slow Cell PPC cores. Andrew From millman at berkeley.edu Wed Oct 21 16:40:52 2009 From: millman at berkeley.edu (Jarrod Millman) Date: Wed, 21 Oct 2009 13:40:52 -0700 Subject: [Numpy-discussion] GSOC 2010 In-Reply-To: References: Message-ID: On Wed, Oct 21, 2009 at 12:02 PM, Charles R Harris wrote: > I don't feel that numpy/scipy did as well in GSOC 2009 as it could have. I'd be curious to hear why you felt that numpy/scipy didn't do as well this year. We had more projects than any other year and I think that most of the code ended being used. It could be that the work done wasn't publicized enough or that the most of the contributions end up contributed to related projects like in a scikit or (hopefully soon to be merged work) in cython. At any rate, I'd be curious to hear more about your concerns so that they we don't repeat them next year (assuming the program is run again next year). > I think this was mostly due to lack of preparation on our part, we weren't > ready when the students started showing up on the lists. So I would like to > put together a selection of suitable projects and corresponding mentors that > we could put on the wiki somewhere and advertise. Just to start things off, > here are two things that come to mind. Regardless, better preparation would be a huge help. Having detailed lists of summer projects will be useful even if the SoC program doesn't get approved for next year. > Python 3k transition. I think it is time to start looking at this seriously. > Best of breed special functions in cython. These could be part of a separate > numpy extras package where code is restricted to C, Cython, and Python. Both of these ideas sounds very interesting. Personally, I would like to see ideas like these make there way into fully fleshed out NEPs: http://projects.scipy.org/numpy/browser/trunk/doc/neps -- Jarrod Millman Helen Wills Neuroscience Institute 10 Giannini Hall, UC Berkeley http://cirl.berkeley.edu/ From aisaac at american.edu Wed Oct 21 16:42:25 2009 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 21 Oct 2009 16:42:25 -0400 Subject: [Numpy-discussion] GSOC 2010 In-Reply-To: References: <1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com> Message-ID: <4ADF7231.8070200@american.edu> On 10/21/2009 3:23 PM, Charles R Harris wrote: > What exactly *was* the history of that project and what can we learn > from it? Imo, what really drove this project forward, is that Skipper was able to interact regularly with someone else who was actively using and developing on the code base (i.e., Josef). While I am confident Skipper would have made a worthwhile contribution without this, I think he would agree that he both learned more and was more productive because he was able to interact with Josef. One other thing that was important was focus: Skipper (and Josef) focused in on making sure an important but doable (summer is very short!) piece of the stats code was refactored, extended, documented, and tested. Alan Isaac PSI do not mean to diminish the importance of the feedback kindly provided by others. From cournape at gmail.com Wed Oct 21 21:24:51 2009 From: cournape at gmail.com (David Cournapeau) Date: Thu, 22 Oct 2009 10:24:51 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> Message-ID: <5b8d13220910211824o3ec041f3h8a7e9f745045567f@mail.gmail.com> On Wed, Oct 21, 2009 at 10:14 PM, Pauli Virtanen wrote: > > This type of project could probably also be started outside Numpy, and > just monkey-patch the Numpy routines on import. I think I would prefer this approach as a first shot. I will look into adding a small C library + wrapper in python to know which SIMD instructions are available to numpy. Then people can reuse this for whatever approach they prefer. David From sturla at molden.no Wed Oct 21 22:31:23 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 22 Oct 2009 04:31:23 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> Message-ID: <4ADFC3FB.5030906@molden.no> Mathieu Blondel skrev: > Hello, > > About one year ago, a high-level, objected-oriented SIMD API was added > to Mono. For example, there is a class Vector4f for vectors of 4 > floats and this class implements methods such as basic operators, > bitwise operators, comparison operators, min, max, sqrt, shuffle > directly using SIMD operations. I think you are confusing SIMD with Intel's MMX/SSE instruction set. SIMD means "single instruction - multiple data". NumPy is interherently an object-oriented SIMD API: array1[:] = array2 + array3 is a SIMD instruction by definition. SIMD instructions in hardware for length-4 vectors are mostly useful for 3D graphics. But they are not used a lot for that purpose, because GPUs are getting common. SSE is mostly for rendering 3D graphics without a GPU. There is nothing that prevents NumPy from having a Vector4f dtype, that internally stores four float32 and is aligned at 16 byte boundaries. But it would not be faster than the current float32 dtype. Do you know why? The reason is that memory access is slow, and computation is fast. Modern CPUs are starved. The speed of NumPy is not limited by not using MMX/SSE whenever possible. It is limited from having to create and delete temporary arrays all the time. You are suggesting to optimize in the wrong place. There is a lot that can be done to speed up computation: There are optimized BLAS libraries like ATLAS and MKL. NumPy uses BLAS for things like matrix multiplication. There are OpenMP for better performance on multicores. There are OpenCL and CUDA for moving computation from CPUs to GPU. But the main boost you get from going from NumPy to hand-written C or Fortran comes from reduced memory use. > existing discussion here. Memory-alignment is an import related issue > since non-aligned movs can tank the performance. > > You can align an ndarray on 16-byte boundary like this: def aligned_array(N, dtype): d = dtype() tmp = numpy.zeros(N * d.nbytes + 16, dtype=numpy.uint8) address = tmp.__array_interface__['data'][0] offset = (16 - address % 16) % 16 return tmp[offset:offset+N].view(dtype=dtype) Sturla Molden From mathieu at mblondel.org Wed Oct 21 23:32:13 2009 From: mathieu at mblondel.org (Mathieu Blondel) Date: Thu, 22 Oct 2009 12:32:13 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <4ADFC3FB.5030906@molden.no> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> Message-ID: <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden wrote: > Mathieu Blondel skrev: >> Hello, >> >> About one year ago, a high-level, objected-oriented SIMD API was added >> to Mono. For example, there is a class Vector4f for vectors of 4 >> floats and this class implements methods such as basic operators, >> bitwise operators, comparison operators, min, max, sqrt, shuffle >> directly using SIMD operations. > I think you are confusing SIMD with Intel's MMX/SSE instruction set. OK, I should have said "Object-oriented SIMD API that is implemented using hardware SIMD instructions". And when an ISA doesn't allow to perform a specific operation in only one instruction (say the absolute value of the differences), the operation can be implemented in terms of other instructions. > SIMD instructions in hardware for length-4 vectors are mostly useful for > 3D graphics. But they are not used a lot for that purpose, because GPUs > are getting common. SSE is mostly for rendering 3D graphics without a > GPU. There is nothing that prevents NumPy from having a Vector4f dtype, > that internally stores four float32 and is aligned at 16 byte > boundaries. But it would not be faster than the current float32 dtype. > Do you know why? Yes I know because this has already been explained in this very thread by someone before you! Mathieu From robert.kern at gmail.com Wed Oct 21 23:46:29 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 21 Oct 2009 22:46:29 -0500 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> Message-ID: <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel wrote: > On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden wrote: >> Mathieu Blondel skrev: >>> Hello, >>> >>> About one year ago, a high-level, objected-oriented SIMD API was added >>> to Mono. For example, there is a class Vector4f for vectors of 4 >>> floats and this class implements methods such as basic operators, >>> bitwise operators, comparison operators, min, max, sqrt, shuffle >>> directly using SIMD operations. >> I think you are confusing SIMD with Intel's MMX/SSE instruction set. > > OK, I should have said "Object-oriented SIMD API that is implemented > using hardware SIMD instructions". No, I think you're right. Using "SIMD" to refer to numpy-like operations is an abuse of the term not supported by any outside community that I am aware of. Everyone else uses "SIMD" to describe hardware instructions, not the application of a single syntactical element of a high level language to a non-trivial data structure containing lots of atomic data elements. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dwf at cs.toronto.edu Thu Oct 22 02:47:09 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 22 Oct 2009 02:47:09 -0400 Subject: [Numpy-discussion] Using numpydoc outside of numpy In-Reply-To: <4ADF251F.3070505@stsci.edu> References: <4ADF1AE3.9030002@stsci.edu> <4ADF251F.3070505@stsci.edu> Message-ID: <20091022064709.GA30908@rodimus> On Wed, Oct 21, 2009 at 11:13:35AM -0400, Michael Droettboom wrote: > Sorry for the noise. Found the instructions in HOWTO_BUILD_DOCS.txt . Not sure if this is part of what you discovered, but numpydoc is at the Cheese Shop too: http://pypi.python.org/pypi/numpydoc David From sturla at molden.no Thu Oct 22 03:35:52 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 22 Oct 2009 09:35:52 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> Message-ID: <4AE00B58.3090802@molden.no> Robert Kern skrev: > No, I think you're right. Using "SIMD" to refer to numpy-like > operations is an abuse of the term not supported by any outside > community that I am aware of. Everyone else uses "SIMD" to describe > hardware instructions, not the application of a single syntactical > element of a high level language to a non-trivial data structure > containing lots of atomic data elements. > Then you should pick up a book on parallel computing. It is common to differentiate between four classes of computers: SISD, MISD, SIMD, and MIMD machines. A SISD system is the classical von Neuman machine. A MISD system is a pipelined von Neuman machine, for example the x86 processor. A SIMD system is one that has one CPU dedicated to control, and a large collection of subordinate ALUs for computation. Each ALU has a small amount of private memory. The IBM Cell processor is the typical SIMD machine. A special class of SIMD machines are the so-called "vector machines", of which the most famous is the Cray C90. The MMX and SSE instructions in Intel Pentium processors are an example of vector instructions. Some computer scientists regard vector machines a subtype of MISD systems, orthogonal to piplines, because there are no subordinate ALUs with private memory. MIMD systems multiple independent CPUs. MIMD systems comes in two categories: shared-memory processors (SMP) and distributed-memory machines (also called cluster computers). The dual- and quad-core x86 processors are shared-memory MIMD machines. Many people associate the word SIMD with SSE due to Intel marketing. But to the extent that vector machines are MISD orthogonal to piplined von Neuman machines, SSE cannot be called SIMD. NumPy is a software simulated vector machine, usually executed on MISD hardware. To the extent that vector machines (such as SSE and C90) are SIMD, we must call NumPy an object-oriented SIMD library. S.M. From matthieu.brucher at gmail.com Thu Oct 22 03:41:10 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 22 Oct 2009 09:41:10 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> Message-ID: >> OK, I should have said "Object-oriented SIMD API that is implemented >> using hardware SIMD instructions". > > No, I think you're right. Using "SIMD" to refer to numpy-like > operations is an abuse of the term not supported by any outside > community that I am aware of. Everyone else uses "SIMD" to describe > hardware instructions, not the application of a single syntactical > element of a high level language to a non-trivial data structure > containing lots of atomic data elements. I agree with Sturla, for instance nVidia GPUs do SIMD computations with blocs of 16 values at a time, but the hardware behind can't compute on so much data at a time. It's SIMD from our point of view, just like Numpy does ;) Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From sturla at molden.no Thu Oct 22 03:45:35 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 22 Oct 2009 09:45:35 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> Message-ID: <4AE00D9F.1070107@molden.no> Matthieu Brucher skrev: > I agree with Sturla, for instance nVidia GPUs do SIMD computations > with blocs of 16 values at a time, but the hardware behind can't > compute on so much data at a time. It's SIMD from our point of view, > just like Numpy does ;) > > A computer with a CPU and a GPU is a SIMD machine by definition, due to the single CPU and the multiple ALUs in the GPU, which are subordinate to the CPU. But with modern computers, these classifications becomes a bit unclear. S.M. From sturla at molden.no Thu Oct 22 04:05:28 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 22 Oct 2009 10:05:28 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp> <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com> Message-ID: <4AE01248.6050303@molden.no> Mathieu Blondel skrev: > Peter Norvig suggested to merge Numpy into Cython but he didn't > mention SIMD as the reason (this one is from me). I don't know what Norvig said or meant. However: There is NumPy support in Cython. Cython has a general syntax applicable to any PEP 3118 buffer. (As NumPy is not yet PEP 3118 compliant, NumPy arrays are converted to Py_buffer structs behind the scenes.) Support for optimized vector expressions might be added later. Currently, slicing works as with NumPy in Python, producing slice objects and invoking NumPy's own code, instead of being converted to fast inlined C. The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k, replacing the current C source. That might be what Norvig meant if he suggested merging NumPy into Cython. S.M. From mathieu at mblondel.org Thu Oct 22 04:26:08 2009 From: mathieu at mblondel.org (Mathieu Blondel) Date: Thu, 22 Oct 2009 17:26:08 +0900 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <4AE01248.6050303@molden.no> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp> <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com> <4AE01248.6050303@molden.no> Message-ID: <7e1472660910220126h1153d37u86ef53de57aa55fe@mail.gmail.com> On Thu, Oct 22, 2009 at 5:05 PM, Sturla Molden wrote: > Mathieu Blondel skrev: > The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k, > replacing the current C source. That might be what Norvig meant if he > suggested merging NumPy into Cython. As I wrote earlier in this thread, I confused Cython and CPython. PN was suggesting to include Numpy in the CPython distribution (not Cython). The reason why was also given earlier. Mathieu From sturla at molden.no Thu Oct 22 04:59:21 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 22 Oct 2009 10:59:21 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <7e1472660910220126h1153d37u86ef53de57aa55fe@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp> <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com> <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp> <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com> <4AE01248.6050303@molden.no> <7e1472660910220126h1153d37u86ef53de57aa55fe@mail.gmail.com> Message-ID: <4AE01EE9.6040409@molden.no> Mathieu Blondel skrev: > As I wrote earlier in this thread, I confused Cython and CPython. PN > was suggesting to include Numpy in the CPython distribution (not > Cython). The reason why was also given earlier. > > First, that would currently not be possible, as NumPy does not support Py3k. Second, the easiest way to port NumPy to Py3k is Cython, which would prevent adoption in the Python standard library. At least they have to change their current policy. Also with NumPy in the standard library, any modification to NumPy would require a PEP. But Python should have a PEP 3118 compliant buffer object in the standard library, which NumPy could subclass. S.M. From nadavh at visionsense.com Thu Oct 22 05:01:52 2009 From: nadavh at visionsense.com (Nadav Horesh) Date: Thu, 22 Oct 2009 11:01:52 +0200 Subject: [Numpy-discussion] Convolution of a masked array Message-ID: <710F2847B0018641891D9A21602763605AD1D9@ex3.envision.co.il> Is there a way to proper convolve a masked array with a normal (nonmasked) array? My specific problem is a convolution of a 2D masked array with a separable kernel (a convolution with 2 1D array along each axis). Nadav. From stefan at sun.ac.za Thu Oct 22 05:29:44 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 22 Oct 2009 11:29:44 +0200 Subject: [Numpy-discussion] ANN: SciPy October Sprint In-Reply-To: <9457e7c80910220228v15638e9awc4b8096b8d7e960e@mail.gmail.com> References: <9457e7c80910070446k2c1ae895u4011d242abc224b@mail.gmail.com> <9457e7c80910220228v15638e9awc4b8096b8d7e960e@mail.gmail.com> Message-ID: <9457e7c80910220229j36774b61x7c259a8679fc3949@mail.gmail.com> Hi all, The weekend is just around the corner, and we're looking forward to the sprint! ?Here is the detail again: """ Our patch queue keeps getting longer and longer, so here is an opportunity to do some spring cleaning (it's spring in South Africa, at least)! Please join us for an October SciPy sprint: ? ?* Date: 24/25 October 2009 (Sat/Sun) ? ?* More information: http://projects.scipy.org/scipy/wiki/SciPySprint200910 We are looking for volunteers to write documentation, review code, fix bugs or design marketing material. New contributors are most welcome, and mentoring will be available. """ See you there, Regards St?fan From gruben at bigpond.net.au Thu Oct 22 05:51:54 2009 From: gruben at bigpond.net.au (Gary Ruben) Date: Thu, 22 Oct 2009 20:51:54 +1100 Subject: [Numpy-discussion] Optimized sum of squares In-Reply-To: <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com> References: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> <20091018075732.GA31449@phare.normalesup.org> <4ADAE897.1070000@bigpond.net.au> <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com> Message-ID: <4AE02B3A.7040407@bigpond.net.au> josef.pktd at gmail.com wrote: > Is it really possible to get the same as np.sum(a*a, axis) with > tensordot if a.ndim=2 ? > Any way I try the "something_else", I get extra terms as in np.dot(a.T, a) Just to answer this question, np.dot(a,a) is equivalent to np.tensordot(a,a, axis=(0,0)) but the latter is about 10x slower for me. That is, you have to specify the axes for both arrays for tensordot: In [16]: a=rand(1000) In [17]: timeit dot(a,a) 100000 loops, best of 3: 3.51 ?s per loop In [18]: timeit tensordot(a,a,(0,0)) 10000 loops, best of 3: 37.6 ?s per loop In [19]: tensordot(a,a,(0,0))==dot(a,a) Out[19]: True From ralf.gommers at googlemail.com Thu Oct 22 06:36:46 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 22 Oct 2009 12:36:46 +0200 Subject: [Numpy-discussion] why does binary_repr don't support arrays In-Reply-To: References: Message-ID: On Tue, Oct 20, 2009 at 11:17 AM, wrote: > > Hello, > > I'm always wondering why binary_repr doesn't allow arrays as input values. > I always have to use a work around like: > > import numpy as np > > def binary_repr(arr, width=None): > binary_list = map((lambda foo: np.binary_repr(foo, width)), > arr.flatten()) > str_len_max = len(np.binary_repr(arr.max(), width=width)) > str_len_min = len(np.binary_repr(arr.min(), width=width)) > if str_len_max > str_len_min: > str_len = str_len_max > else: > str_len = str_len_min > binary_array = np.fromiter(binary_list, dtype='|S'+str(str_len)) > return binary_array.reshape(arr.shape) > > Is there a reason why arrays are not supported or is there another function > that does support arrays? > Not sure if there was/is a reason, but imho it would be nice to have support for arrays. Also in base_repr. Could you file a ticket in trac? Cheers, Ralf > > Thanks, > > Markus > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregor.thalhammer at gmail.com Thu Oct 22 06:48:14 2009 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Thu, 22 Oct 2009 12:48:14 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <200910211112.22037.faltet@pytables.org> <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> <200910211447.02842.faltet@pytables.org> <42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com> Message-ID: <42de02940910220348r126ddd50ra8377da8c359ecb9@mail.gmail.com> 2009/10/21 Neal Becker > ... > > I once wrote a module that replaces the built in transcendental > > functions of numpy by optimized versions from Intels vector math > > library. If someone is interested, I can publish it. In my experience it > > was of little use since real world problems are limited by memory > > bandwidth. Therefore extending numexpr with optimized transcendental > > functions was the better solution. Afterwards I discovered that I could > > have saved the effort of the first approach since gcc is able to use > > optimized functions from Intels vector math library or AMD's math core > > library, see the doc's of -mveclibabi. You just need to recompile numpy > > with proper compiler arguments. > > > > I'm interested. I'd like to try AMD rather than intel, because AMD is > easier to obtain. I'm running on intel machine, I hope that doesn't matter > too much. > > What exactly do I need to do? > I once tried to recompile numpy with AMD's AMCL. Unfortunately I lost the settings after an upgrade. What I remember: install AMCL, (and read the docs ;-) ), mess with the compiler args (-mveclibabi and related), link with the AMCL. Then you get faster pow/sin/cos/exp. The transcendental functions of AMCL also work with Intel processors with the same performance. I did not try the Intel SVML, which belongs to the Intel compilers. This is different to the first approach, which is a small wrapper for Intels VML, put into a python module and which can inject it's ufuncs (via numpy.set_numeric_ops) into numpy. If you want I can send the package per private email. > I see that numpy/site.cfg has an MKL section. I'm assuming I should not > touch that, but just mess with gcc flags? > This is for using the lapack provided by Intels MKL. These settings are not related to the above mentioned compiler options. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Thu Oct 22 07:20:17 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 22 Oct 2009 13:20:17 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> Message-ID: <4AE03FF1.5000309@student.matnat.uio.no> Robert Kern wrote: > On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel wrote: > >> On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden wrote: >> >>> Mathieu Blondel skrev: >>> >>>> Hello, >>>> >>>> About one year ago, a high-level, objected-oriented SIMD API was added >>>> to Mono. For example, there is a class Vector4f for vectors of 4 >>>> floats and this class implements methods such as basic operators, >>>> bitwise operators, comparison operators, min, max, sqrt, shuffle >>>> directly using SIMD operations. >>>> >>> I think you are confusing SIMD with Intel's MMX/SSE instruction set. >>> >> OK, I should have said "Object-oriented SIMD API that is implemented >> using hardware SIMD instructions". >> > > No, I think you're right. Using "SIMD" to refer to numpy-like > operations is an abuse of the term not supported by any outside > community that I am aware of. Everyone else uses "SIMD" to describe > hardware instructions, not the application of a single syntactical > element of a high level language to a non-trivial data structure > containing lots of atomic data elements. > BTW, is there any term for this latter concept that's not SIMD or "vector operation"? It would be good to have a word to distinguish this concept from both CPU instructions and linear algebra. (Personally I think describing NumPy as SIMD and use "SSE/MMX" for CPU instructions makes best sense, but I'm happy to yield to conventions...) Dag Sverre From markus.proeller at ifm.com Thu Oct 22 07:45:46 2009 From: markus.proeller at ifm.com (markus.proeller at ifm.com) Date: Thu, 22 Oct 2009 13:45:46 +0200 Subject: [Numpy-discussion] Antwort: Re: why does binary_repr don't support arrays In-Reply-To: Message-ID: numpy-discussion-bounces at scipy.org schrieb am 22.10.2009 12:36:46: > > > > > > On Tue, Oct 20, 2009 at 11:17 AM, wrote: > > > > Hello, > > > > I'm always wondering why binary_repr doesn't allow arrays as input > > values. I always have to use a work around like: > > > > import numpy as np > > > > def binary_repr(arr, width=None): > > binary_list = map((lambda foo: np.binary_repr(foo, width)), arr.flatten()) > > str_len_max = len(np.binary_repr(arr.max(), width=width)) > > str_len_min = len(np.binary_repr(arr.min(), width=width)) > > if str_len_max > str_len_min: > > str_len = str_len_max > > else: > > str_len = str_len_min > > binary_array = np.fromiter(binary_list, dtype='|S'+str(str_len)) > > return binary_array.reshape(arr.shape) > > > > Is there a reason why arrays are not supported or is there another > > function that does support arrays? > > Not sure if there was/is a reason, but imho it would be nice to have > support for arrays. Also in base_repr. Could you file a ticket in trac? > > Cheers, > Ralf > Okay, I opened a new ticket: http://projects.scipy.org/numpy/ticket/1270 Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From ferrell at diablotech.com Thu Oct 22 08:40:33 2009 From: ferrell at diablotech.com (Robert Ferrell) Date: Thu, 22 Oct 2009 06:40:33 -0600 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <4AE00B58.3090802@molden.no> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> <4AE00B58.3090802@molden.no> Message-ID: <290D922D-EBA1-4334-AF09-63C397F36C0F@diablotech.com> On Oct 22, 2009, at 1:35 AM, Sturla Molden wrote: > Robert Kern skrev: >> No, I think you're right. Using "SIMD" to refer to numpy-like >> operations is an abuse of the term not supported by any outside >> community that I am aware of. Everyone else uses "SIMD" to describe >> hardware instructions, not the application of a single syntactical >> element of a high level language to a non-trivial data structure >> containing lots of atomic data elements. >> > Then you should pick up a book on parallel computing. > > It is common to differentiate between four classes of computers: SISD, > MISD, SIMD, and MIMD machines. > > A SISD system is the classical von Neuman machine. A MISD system is a > pipelined von Neuman machine, for example the x86 processor. > > A SIMD system is one that has one CPU dedicated to control, and a > large > collection of subordinate ALUs for computation. Each ALU has a small > amount of private memory. The IBM Cell processor is the typical SIMD > machine. > > A special class of SIMD machines are the so-called "vector > machines", of > which the most famous is the Cray C90. The MMX and SSE instructions in > Intel Pentium processors are an example of vector instructions. Some > computer scientists regard vector machines a subtype of MISD systems, > orthogonal to piplines, because there are no subordinate ALUs with > private memory. > > MIMD systems multiple independent CPUs. MIMD systems comes in two > categories: shared-memory processors (SMP) and distributed-memory > machines (also called cluster computers). The dual- and quad-core x86 > processors are shared-memory MIMD machines. > > Many people associate the word SIMD with SSE due to Intel marketing. > But > to the extent that vector machines are MISD orthogonal to piplined von > Neuman machines, SSE cannot be called SIMD. > > NumPy is a software simulated vector machine, usually executed on MISD > hardware. To the extent that vector machines (such as SSE and C90) are > SIMD, we must call NumPy an object-oriented SIMD library. This is not the terminology I am familiar with. Calling NumPy an " object-oriented SIMD library" is very confusing for me. I worked in the parallel computer world for a while (back in the dark ages) and this terminology would have been confusing to everyone I dealt with. I've also read many parallel computing books. In my experience SIMD refers to hardware, not software. There is no reason that NumPy can't be written to run great (get good speed-ups) on an 8-core shared memory system. That would be a MIMD system, and there's nothing about it that doesn't fit with the NumPy abstraction. And, although SIMD can be a subset of MIMD, there are things that can be done in NumPy that be parallelized on MIMD machines but not on SIMD machines (e.g. the NumPy vector type is flexible enough it can store a list of tasks, and the operations on that vector can be parallelized easily on a shared memory MIMD machine - task parallelism - but not on a SIMD machine). If we say that "NumPy is a software simulated vector machine" or an " object-oriented SIMD library" we are pigeonholing NumPy in a way which is too limiting and isn't accurate. As a user it feels to me that NumPy is built around various algebra abstractions, many of which map well onto vector machine operations. That means that many of the operations are amenable to efficient implementation on SIMD hardware. But, IMO, one of the nice features of NumPy is it is built around high- level operations, and I would hate to see the project go down a path which insists that everything in NumPy be efficient on all SIMD hardware. Of course, I would also love to see implementations which take as much advantage of available HW as possible (e.g. exploit SIMD HW if available). That's my $0.02, worth only a couple cents less than that. -robert From robert.kern at gmail.com Thu Oct 22 11:51:14 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 22 Oct 2009 10:51:14 -0500 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <4AE00B58.3090802@molden.no> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> <4AE00B58.3090802@molden.no> Message-ID: <3d375d730910220851ib57a10co47ee743f38b1bc0e@mail.gmail.com> On Thu, Oct 22, 2009 at 02:35, Sturla Molden wrote: > Robert Kern skrev: >> No, I think you're right. Using "SIMD" to refer to numpy-like >> operations is an abuse of the term not supported by any outside >> community that I am aware of. Everyone else uses "SIMD" to describe >> hardware instructions, not the application of a single syntactical >> element of a high level language to a non-trivial data structure >> containing lots of atomic data elements. >> > Then you should pick up a book on parallel computing. I would be delighted to see a reference to one that refers to a high level language's API as SIMD. Please point one out to me. It's certainly not any of the ones I have available to me. > It is common to differentiate between four classes of computers: SISD, > MISD, SIMD, and MIMD machines. > > A SISD system is the classical von Neuman machine. A MISD system is a > pipelined von Neuman machine, for example the x86 processor. > > A SIMD system is one that has one CPU dedicated to control, and a large > collection of subordinate ALUs for computation. Each ALU has a small > amount of private memory. The IBM Cell processor is the typical SIMD > machine. > > A special class of SIMD machines are the so-called "vector machines", of > which the most famous is the Cray C90. The MMX and SSE instructions in > Intel Pentium processors are an example of vector instructions. Some > computer scientists regard vector machines a subtype of MISD systems, > orthogonal to piplines, because there are no subordinate ALUs with > private memory. > > MIMD systems multiple independent CPUs. MIMD systems comes in two > categories: shared-memory processors (SMP) and distributed-memory > machines (also called cluster computers). The dual- and quad-core x86 > processors are shared-memory MIMD machines. > > Many people associate the word SIMD with SSE due to Intel marketing. But > to the extent that vector machines are MISD orthogonal to piplined von > Neuman machines, SSE cannot be called SIMD. That's a fair point, but unrelated to whether or not numpy can be labeled SIMD. These all refer to hardware. > NumPy is a software simulated vector machine, usually executed on MISD > hardware. To the extent that vector machines (such as SSE and C90) are > SIMD, we must call NumPy an object-oriented SIMD library. numpy does not "simulate" anything. It is an object-oriented library. If numpy could be said to "simulate" a vector machine, than just about any object-oriented library that overloads operators could. It creates a false equivalence between numpy and software that actually does simulate hardware. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Thu Oct 22 12:01:20 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 22 Oct 2009 11:01:20 -0500 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <4AE03FF1.5000309@student.matnat.uio.no> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> <4AE03FF1.5000309@student.matnat.uio.no> Message-ID: <3d375d730910220901t2476e9b6i59c8c33a8e59686c@mail.gmail.com> On Thu, Oct 22, 2009 at 06:20, Dag Sverre Seljebotn wrote: > Robert Kern wrote: >> On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel wrote: >> >>> On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden wrote: >>> >>>> Mathieu Blondel skrev: >>>> >>>>> Hello, >>>>> >>>>> About one year ago, a high-level, objected-oriented SIMD API was added >>>>> to Mono. For example, there is a class Vector4f for vectors of 4 >>>>> floats and this class implements methods such as basic operators, >>>>> bitwise operators, comparison operators, min, max, sqrt, shuffle >>>>> directly using SIMD operations. >>>>> >>>> I think you are confusing SIMD with Intel's MMX/SSE instruction set. >>>> >>> OK, I should have said "Object-oriented SIMD API that is implemented >>> using hardware SIMD instructions". >>> >> >> No, I think you're right. Using "SIMD" to refer to numpy-like >> operations is an abuse of the term not supported by any outside >> community that I am aware of. Everyone else uses "SIMD" to describe >> hardware instructions, not the application of a single syntactical >> element of a high level language to a non-trivial data structure >> containing lots of atomic data elements. >> > BTW, is there any term for this latter concept that's not SIMD or > "vector operation"? It would be good to have a word to distinguish this > concept from both CPU instructions and linear algebra. Of course, "vector instruction" and "vectorized operation" sometimes also refer to the CPU instructions. :-) I don't think you will get much better than "vectorized operation", though. While it's ambiguous, it has a long history in the high level language world thanks to Matlab. > (Personally I think describing NumPy as SIMD and use "SSE/MMX" for CPU > instructions makes best sense, but I'm happy to yield to conventions...) Well, "SSE/MMX" is also too limiting. Altivec instructions are also in the same class, and we should be able to use them on PPC platforms. Regardless of the origin of the term, "SIMD" is used to refer to all of these instructions in common practice. Sturla may be right in some prescriptive sense, but descriptively, he's quite wrong. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From silva at lma.cnrs-mrs.fr Thu Oct 22 13:28:14 2009 From: silva at lma.cnrs-mrs.fr (Fabricio Silva) Date: Thu, 22 Oct 2009 19:28:14 +0200 Subject: [Numpy-discussion] Sphinx/Numpydoc, attributes and property Message-ID: <1256232494.3391.19.camel@PCTerrusse> It seems that either Sphinx or NumpyDoc is having troubles with property attributes. Considering the following piece of code in foo.py class Profil(object): """ Blabla Attributes ---------- tfin tdeb : float Startpoint pts : array Blabla2. """ def __init__(self): """ """ self.pts = np.array([[0,1]]) @property def tfin(self): "The time horizon endpoint." return self.pts[0,:].max() @property def tdeb(self): "The time horizon startpoint." return self.pts[0,:].min() and a foo.rst containing :mod:`foo` -- BlaTitle ===================================================== .. autoclass:: foo.Profil produces an attribute-table with only pts but without tfin and tdeb. How can I handle this? -- Fabrice Silva Laboratory of Mechanics and Acoustics (CNRS, UPR 7051) From sturla at molden.no Thu Oct 22 13:42:42 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 22 Oct 2009 19:42:42 +0200 Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy In-Reply-To: <3d375d730910220851ib57a10co47ee743f38b1bc0e@mail.gmail.com> References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> <4ADFC3FB.5030906@molden.no> <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> <4AE00B58.3090802@molden.no> <3d375d730910220851ib57a10co47ee743f38b1bc0e@mail.gmail.com> Message-ID: <4AE09992.10607@molden.no> Robert Kern skrev: > I would be delighted to see a reference to one that refers to a high > level language's API as SIMD. Please point one out to me. It's > certainly not any of the ones I have available to me. > > Numerical Receipes in Fortran 90, page 964 and 985-986, describes the syntax of Fortran 90 and 95 as SIMD. Peter Pacheco's book on MPI describes the difference between von Neumann machines and vector machines as analogous to the difference between Fortran77 and Fortran 90 (with an example from Fortran90 array slicing). He is ambigous as to whether vector machines really are SIMD, or more related to pipelined von Neumann machines. Grama et al. "Introduction to Parallel Computing" describes SIMD as an "architecture", but it is more or less clear that the mean hardware. They do say the Fortran 90 "where statement" is a primitive used to support selective execution on SIMD processors, as conditional execution (if statements) are detrimental to performance. So at least we here have three books claiming that Fortran is a language with special primities for SIMD processors. > > That's a fair point, but unrelated to whether or not numpy can be > labeled SIMD. These all refer to hardware. > Actually I don't think the distinction is that important as we are taking about Turing machines. Also, a lot of what we call "hardware" is actually implemented as software on the chip: The most extreme example would be Transmeta, which completely software emulated x86 processors. The vague distinction between hardware and software is why we get patents on software in Europe, although pure software patents are prohibited. One can always argue that the program and the computer together constitutes a physical device; and circumventing patents by moving hardware into software should not be allowed. The distinction between hardware and software is not as clear as programmers tend to believe. Another thing is that performance issues for vector machines and "vector languages" (Fortran 90, Matlab, NumPy) are similar. Precisely the same situations that makes NumPy and Matlab code slow are detrimental on SIMD/vector hardware. That would for example be long for loops with conditional if statements. On the other hand, vectorized operations over arrays, possibly using where/find masks, are fast. So although NumPy is not executed on a vector machine like the Cray C90, it certainly behaves like one performance wise. I'd say that a MIMD machine running NumPy is a Turing machine emulating a SIMD/vector machine. And now I am done with this stupid discussion... Sturla Molden From silva at lma.cnrs-mrs.fr Thu Oct 22 18:42:16 2009 From: silva at lma.cnrs-mrs.fr (Fabricio Silva) Date: Fri, 23 Oct 2009 00:42:16 +0200 Subject: [Numpy-discussion] Sphinx/Numpydoc, attributes and property In-Reply-To: <1256232494.3391.19.camel@PCTerrusse> References: <1256232494.3391.19.camel@PCTerrusse> Message-ID: <1256251336.3391.27.camel@PCTerrusse> It seems that class Profil(object): def __init__(self): """ """ pass def bla(self): "Blabla." return 0 @property def tdeb(self): "The time horizon startpoint." return self.pts[0,:].min() > and a foo.rst containing :mod:`foo` -- BlaTitle ===================================================== .. autoclass:: foo.Profil :members: bla, tdeb produces a listing untitled "Methods" with methods bla and tdeb. Despite tdeb is defined as a method, the decorator make tdeb be a property which I would treat as an attribute and put it in the attribute list. That is not what is done in sphinx/numpydoc. Who is to "blame" ? Sphinx or NumpyDoc ? -- Fabrice Silva Laboratory of Mechanics and Acoustics (CNRS, UPR 7051) From dwf at cs.toronto.edu Fri Oct 23 05:02:45 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 23 Oct 2009 05:02:45 -0400 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> <716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu> Message-ID: On 21-Oct-09, at 11:01 AM, Ryan May wrote: > ~/.local was added to *be the standard* for easily installing python > packages in your user account. And it works perfectly on the other > major OSes, no twiddling of paths anymore. I've had a lot of headaches with ~/.local on Ubuntu, actually. Apparently Ubuntu has some crazy 'dist-packages' thing going on in parallel to site-packages and /usr and /usr/local and its precedence is unclear. virtualenv also doesn't know jack about it (speaking of which, there's no way to control precedence of ~/.local with virtualenv, so I can't use virtualenv to override ~/.local if I want to treat "~/.local as the new site-packages"). Packaging is still more pain than it should be on *any* platform, I think, and I doubt we'll have it all sorted out until somewhere in the mid-to-upper 3.x's. :( David From dwf at cs.toronto.edu Fri Oct 23 05:09:38 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 23 Oct 2009 05:09:38 -0400 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> Message-ID: On 21-Oct-09, at 6:58 AM, Robin wrote: > My only worry is with installer packages - I'm thinking mainly of > wxpython. Is there a way I can get that package to install in > $HOME/.local. (The installer only seems to let you choose a drive). > Also - if I build for example vim against the system python, will I be > able to see packages in $HOME/.local from the python interpreter > inside vim? wxPython is going to be a problem with 64-bit Python. Namely, wxMac is based on Carbon, there is no 64-bit Carbon, and the wxCocoa port is not quite up to snuff. At any rate, any binary installer packages will almost certainly *not* work with the system python, at least if it runs in 64-bit mode by default. The Python.org sources for 2.6.x has a script in the Mac/ subdirectory (I think, or in the build tools) for building a 4-way universal binary (i386, x86_64, ppc and ppc64). You can rather easily build it (just run the script) and it will produce executables of the form python (or python2.6) suffixed with -32 or -64 to run in one mode or the other. So, python-32 (or python2.6-32) will get you 32 bit Python, which will work with wxPython using wxMac, or python-64, which will not (but will do everything in 64-bit mode). I've successfully gotten svn numpy to build 4-way using such a 4-way Python. David From cournape at gmail.com Fri Oct 23 05:26:45 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 23 Oct 2009 18:26:45 +0900 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> <716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu> Message-ID: <5b8d13220910230226k15b28fc5n21e85e77e9e1a262@mail.gmail.com> On Fri, Oct 23, 2009 at 6:02 PM, David Warde-Farley wrote: > > Packaging is still more pain than it should be on *any* platform, I > think, and I doubt we'll have it all sorted out until somewhere in the > mid-to-upper 3.x's. :( I think numpy and scipy on py3k will happen before that :) David From dsdale24 at gmail.com Fri Oct 23 09:21:17 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Fri, 23 Oct 2009 09:21:17 -0400 Subject: [Numpy-discussion] numpy and C99 Message-ID: Can we use features of C99 in numpy? For example, can we use "//" style comments, and C99 for statements "for (int i=0, ...) "? Darren From mdroe at stsci.edu Fri Oct 23 09:25:12 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Fri, 23 Oct 2009 09:25:12 -0400 Subject: [Numpy-discussion] numpydoc without autosummary Message-ID: <4AE1AEB8.30709@stsci.edu> Is there a way to use numpydoc without putting an autosummary table at the head of each class? I'm using numpydoc primarily for the sectionized docstring support, but the autosummaries are somewhat overkill for my project. Mike From pav+sp at iki.fi Fri Oct 23 09:29:58 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Fri, 23 Oct 2009 13:29:58 +0000 (UTC) Subject: [Numpy-discussion] numpy and C99 References: Message-ID: Fri, 23 Oct 2009 09:21:17 -0400, Darren Dale wrote: > Can we use features of C99 in numpy? For example, can we use "//" style > comments, and C99 for statements "for (int i=0, ...) "? It would be much easier if we could, but so far we have strived for C89 compliance. So I guess the answer is "no". -- Pauli Virtanen From pav+sp at iki.fi Fri Oct 23 09:31:54 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Fri, 23 Oct 2009 13:31:54 +0000 (UTC) Subject: [Numpy-discussion] numpydoc without autosummary References: <4AE1AEB8.30709@stsci.edu> Message-ID: Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote: > Is there a way to use numpydoc without putting an autosummary table at > the head of each class? I'm using numpydoc primarily for the > sectionized docstring support, but the autosummaries are somewhat > overkill for my project. Numpydoc hooks into sphinx.ext.autodoc's docstring mangling. So if you just need to have docstrings formatted, you can use Sphinx's auto*:: directives. -- Pauli Virtanen From pav+sp at iki.fi Fri Oct 23 09:39:29 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Fri, 23 Oct 2009 13:39:29 +0000 (UTC) Subject: [Numpy-discussion] numpydoc without autosummary References: <4AE1AEB8.30709@stsci.edu> Message-ID: Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote: > Is there a way to use numpydoc without putting an autosummary table at > the head of each class? I'm using numpydoc primarily for the > sectionized docstring support, but the autosummaries are somewhat > overkill for my project. Ah, you meant the stuff output by default to class docstrings. Currently, there's no way to turn this off, unfortunately. It seems there should be, though... -- Pauli Virtanen From dsdale24 at gmail.com Fri Oct 23 09:48:01 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Fri, 23 Oct 2009 09:48:01 -0400 Subject: [Numpy-discussion] numpy and C99 In-Reply-To: References: Message-ID: On Fri, Oct 23, 2009 at 9:29 AM, Pauli Virtanen wrote: > Fri, 23 Oct 2009 09:21:17 -0400, Darren Dale wrote: >> Can we use features of C99 in numpy? For example, can we use "//" style >> comments, and C99 for statements "for (int i=0, ...) "? > > It would be much easier if we could, but so far we have strived for C89 > compliance. So I guess the answer is "no". Out of curiosity (I am relatively new to C), what is holding numpy back from embracing C99? Why adhere to a 20-year-old standard? Darren From dagss at student.matnat.uio.no Fri Oct 23 10:03:14 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 23 Oct 2009 16:03:14 +0200 Subject: [Numpy-discussion] numpy and C99 In-Reply-To: References: Message-ID: <4AE1B7A2.9070603@student.matnat.uio.no> Darren Dale wrote: > On Fri, Oct 23, 2009 at 9:29 AM, Pauli Virtanen wrote: > >> Fri, 23 Oct 2009 09:21:17 -0400, Darren Dale wrote: >> >>> Can we use features of C99 in numpy? For example, can we use "//" style >>> comments, and C99 for statements "for (int i=0, ...) "? >>> >> It would be much easier if we could, but so far we have strived for C89 >> compliance. So I guess the answer is "no". >> > > Out of curiosity (I am relatively new to C), what is holding numpy > back from embracing C99? Why adhere to a 20-year-old standard? > Microsoft's compilers don't support C99 (or, at least, versions that still has to be used doesn't). Dag Sverre From cournape at gmail.com Fri Oct 23 10:09:55 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 23 Oct 2009 23:09:55 +0900 Subject: [Numpy-discussion] numpy and C99 In-Reply-To: References: Message-ID: <5b8d13220910230709s2c7a6712o8dc6914c84de510c@mail.gmail.com> On Fri, Oct 23, 2009 at 10:21 PM, Darren Dale wrote: > Can we use features of C99 in numpy? For example, can we use "//" > style comments, and C99 for statements "for (int i=0, ...) "? No, and most likely never will. Even Visual Studio 2010 does not handle basic C99. David From charlesr.harris at gmail.com Fri Oct 23 10:33:19 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 23 Oct 2009 08:33:19 -0600 Subject: [Numpy-discussion] numpy and C99 In-Reply-To: References: Message-ID: On Fri, Oct 23, 2009 at 7:48 AM, Darren Dale wrote: > On Fri, Oct 23, 2009 at 9:29 AM, Pauli Virtanen > > wrote: > > Fri, 23 Oct 2009 09:21:17 -0400, Darren Dale wrote: > >> Can we use features of C99 in numpy? For example, can we use "//" style > >> comments, and C99 for statements "for (int i=0, ...) "? > > > > It would be much easier if we could, but so far we have strived for C89 > > compliance. So I guess the answer is "no". > > Out of curiosity (I am relatively new to C), what is holding numpy > back from embracing C99? Why adhere to a 20-year-old standard? > > To clarify: most compilers support the "//" comment style, but some of the older Sun compilers don't. The main problem on using any of the newer stuff is portability. Some of the new stuff, like "//", while handy isn't crucial. What really hurts is not being able to rely on the math library being up to snuff. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Oct 23 10:41:27 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 23 Oct 2009 16:41:27 +0200 Subject: [Numpy-discussion] numpy and C99 In-Reply-To: <4AE1B7A2.9070603@student.matnat.uio.no> References: <4AE1B7A2.9070603@student.matnat.uio.no> Message-ID: <4AE1C097.1080904@molden.no> Dag Sverre Seljebotn skrev: > Microsoft's compilers don't support C99 (or, at least, versions that > still has to be used doesn't). > > Except for automatic arrays, they do support some of the more important parts of C99 as extensions to C89: inline functions restrict qualifier for (int i=0; i<; i++) Personally I think all of NumPy's C base should be moved to Cython. With your excellent syntax for PEP 3118 buffers, I see no reason to keep NumPy in C. This would make porting to Py3k as well as maintainence easier. When Cython can build Sage, it can be used for a smaller project like NumPy as well. The question of using C89, C99 or C++ would be deferred to the Cython compiler. We could use C++ on one platform (MSVC) and C99 on another (GCC). We would also get direct support for C99 _Complex and C++ std::complex<> types. I'd also suggest that ndarray subclasses memoryview in Py3k. S.M. From rmay31 at gmail.com Fri Oct 23 10:46:47 2009 From: rmay31 at gmail.com (Ryan May) Date: Fri, 23 Oct 2009 09:46:47 -0500 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> <716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu> Message-ID: On Fri, Oct 23, 2009 at 4:02 AM, David Warde-Farley wrote: > On 21-Oct-09, at 11:01 AM, Ryan May wrote: > >> ~/.local was added to *be the standard* for easily installing python >> packages in your user account. ?And it works perfectly on the other >> major OSes, no twiddling of paths anymore. > > I've had a lot of headaches with ~/.local on Ubuntu, actually. > Apparently Ubuntu has some crazy 'dist-packages' thing going on in > parallel to site-packages and /usr and /usr/local and its precedence > is unclear. virtualenv also doesn't know jack about it (speaking of > which, there's no way to control precedence of ~/.local with > virtualenv, so I can't use virtualenv to override ~/.local if I want > to treat "~/.local as the new site-packages"). Ok, so *some* linux distros also choose to break stuff. I'm noticing a theme here where OSes that strive for ease end up breaking something basic. I'm not saying they all need to drastically change; they just need to insert their paths *after* ~/.local. (Thankfully, Gentoo doesn't get in my way.) Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From gael.varoquaux at normalesup.org Fri Oct 23 11:04:41 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 23 Oct 2009 17:04:41 +0200 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp> <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> <716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu> Message-ID: <20091023150441.GA16254@phare.normalesup.org> On Fri, Oct 23, 2009 at 09:46:47AM -0500, Ryan May wrote: > On Fri, Oct 23, 2009 at 4:02 AM, David Warde-Farley wrote: > > On 21-Oct-09, at 11:01 AM, Ryan May wrote: > >> ~/.local was added to *be the standard* for easily installing python > >> packages in your user account. ?And it works perfectly on the other > >> major OSes, no twiddling of paths anymore. > > I've had a lot of headaches with ~/.local on Ubuntu, actually. > > Apparently Ubuntu has some crazy 'dist-packages' thing going on in > > parallel to site-packages and /usr and /usr/local and its precedence > > is unclear. virtualenv also doesn't know jack about it (speaking of > > which, there's no way to control precedence of ~/.local with > > virtualenv, so I can't use virtualenv to override ~/.local if I want > > to treat "~/.local as the new site-packages"). > Ok, so *some* linux distros also choose to break stuff. I'm noticing > a theme here where OSes that strive for ease end up breaking something > basic. I'm not saying they all need to drastically change; they just > need to insert their paths *after* ~/.local. (Thankfully, Gentoo > doesn't get in my way.) For instance, last time I looked, fedora had removed numpy.distutils from the numpy package, and packaged it in a different package. Very confusing... Ga?l From charlesr.harris at gmail.com Fri Oct 23 11:47:58 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 23 Oct 2009 09:47:58 -0600 Subject: [Numpy-discussion] numpy and C99 In-Reply-To: <4AE1C097.1080904@molden.no> References: <4AE1B7A2.9070603@student.matnat.uio.no> <4AE1C097.1080904@molden.no> Message-ID: On Fri, Oct 23, 2009 at 8:41 AM, Sturla Molden wrote: > Dag Sverre Seljebotn skrev: > > > > > Microsoft's compilers don't support C99 (or, at least, versions that > > still has to be used doesn't). > > > > > Except for automatic arrays, they do support some of the more important > parts of C99 as extensions to C89: > > inline functions > restrict qualifier > for (int i=0; i<; i++) > > > Personally I think all of NumPy's C base should be moved to Cython. With > your excellent syntax for PEP 3118 buffers, I see no reason to keep > NumPy in C. This would make porting to Py3k as well as maintainence > easier. When Cython can build Sage, it can be used for a smaller project > like NumPy as well. > > Sage doesn't have the accumulated layers of crud that numpy has. Yet ;) However, moving parts of the code to cython is certainly one path forward. A good starting point would probably be to separate ufuncs from ndarrays. However, I think some code, say loops.c.src, looks better in C than it would in cython. C is a rather nice language for that sort of thing. OTOH, the ufunc_object.c code might look better in cython. In general, I think a separation between pure C code and python interface code would be the way to go, with the latter written in cython. > The question of using C89, C99 or C++ would be deferred to the Cython > compiler. We could use C++ on one platform (MSVC) and C99 on another > (GCC). We would also get direct support for C99 _Complex and C++ > std::complex<> types. > > How about symbol export control for the modules? I think that is one more tool that would benefit from a portable interface in cython. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Oct 23 11:52:20 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 23 Oct 2009 17:52:20 +0200 Subject: [Numpy-discussion] numpy and C99 In-Reply-To: References: <4AE1B7A2.9070603@student.matnat.uio.no> <4AE1C097.1080904@molden.no> Message-ID: <20091023155220.GC16254@phare.normalesup.org> On Fri, Oct 23, 2009 at 09:47:58AM -0600, Charles R Harris wrote: > However, I think some code, say loops.c.src, looks better in C than it > would in cython. C is a rather nice language for that sort of thing. OTOH, > the ufunc_object.c code might look better in cython. In general, I think a > separation between pure C code and python interface code would be the way > to go, with the latter written in cython. I have some demand in house to be able to use the C parts of numpy for C. Say for instance you are coding a Python library, with a C-optimized Monte-Carlo sampler. Linking to the C code of randomkit is very useful for this. Right now the only way to do this is to copy the randomkit source and to ship it with your libary, however, hopefully, this will change in the long run. So I guess this is a +1 to keep some core numerical functionality in C, most probably for ABI reasons. My 2 cents, Ga?l From mdroe at stsci.edu Fri Oct 23 12:13:08 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Fri, 23 Oct 2009 12:13:08 -0400 Subject: [Numpy-discussion] numpydoc without autosummary In-Reply-To: References: <4AE1AEB8.30709@stsci.edu> Message-ID: <4AE1D614.20609@stsci.edu> On 10/23/2009 09:39 AM, Pauli Virtanen wrote: > Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote: > >> Is there a way to use numpydoc without putting an autosummary table at >> the head of each class? I'm using numpydoc primarily for the >> sectionized docstring support, but the autosummaries are somewhat >> overkill for my project. >> > Ah, you meant the stuff output by default to class docstrings. Currently, > there's no way to turn this off, unfortunately. It seems there should be, > though... > > Exactly. It would be great if there was a conf.py option (or something) to turn this off. Thanks for considering it. Cheers, Mike From d.l.goldsmith at gmail.com Fri Oct 23 14:56:31 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 23 Oct 2009 11:56:31 -0700 Subject: [Numpy-discussion] numpydoc without autosummary In-Reply-To: <4AE1D614.20609@stsci.edu> References: <4AE1AEB8.30709@stsci.edu> <4AE1D614.20609@stsci.edu> Message-ID: <45d1ab480910231156n67cef87bpf514a37636579d6c@mail.gmail.com> The proper thing to do is file an "enhancement" ticket, at: http://code.google.com/p/pydocweb/issues/list Thanks. DG On Fri, Oct 23, 2009 at 9:13 AM, Michael Droettboom wrote: > On 10/23/2009 09:39 AM, Pauli Virtanen wrote: > > Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote: > > > >> Is there a way to use numpydoc without putting an autosummary table at > >> the head of each class? I'm using numpydoc primarily for the > >> sectionized docstring support, but the autosummaries are somewhat > >> overkill for my project. > >> > > Ah, you meant the stuff output by default to class docstrings. Currently, > > there's no way to turn this off, unfortunately. It seems there should be, > > though... > > > > > Exactly. It would be great if there was a conf.py option (or something) > to turn this off. Thanks for considering it. > > Cheers, > Mike > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdroe at stsci.edu Fri Oct 23 15:47:56 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Fri, 23 Oct 2009 15:47:56 -0400 Subject: [Numpy-discussion] numpydoc without autosummary In-Reply-To: <45d1ab480910231156n67cef87bpf514a37636579d6c@mail.gmail.com> References: <4AE1AEB8.30709@stsci.edu> <4AE1D614.20609@stsci.edu> <45d1ab480910231156n67cef87bpf514a37636579d6c@mail.gmail.com> Message-ID: <4AE2086C.7050208@stsci.edu> Done. Issue #50. Thanks, Mike On 10/23/2009 02:56 PM, David Goldsmith wrote: > The proper thing to do is file an "enhancement" ticket, at: > > http://code.google.com/p/pydocweb/issues/list > > Thanks. > > DG > > On Fri, Oct 23, 2009 at 9:13 AM, Michael Droettboom > wrote: > > On 10/23/2009 09:39 AM, Pauli Virtanen wrote: > > Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote: > > > >> Is there a way to use numpydoc without putting an autosummary > table at > >> the head of each class? I'm using numpydoc primarily for the > >> sectionized docstring support, but the autosummaries are somewhat > >> overkill for my project. > >> > > Ah, you meant the stuff output by default to class docstrings. > Currently, > > there's no way to turn this off, unfortunately. It seems there > should be, > > though... > > > > > Exactly. It would be great if there was a conf.py option (or > something) > to turn this off. Thanks for considering it. > > Cheers, > Mike > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Oct 23 15:54:20 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 24 Oct 2009 04:54:20 +0900 Subject: [Numpy-discussion] numpy and C99 In-Reply-To: <4AE1C097.1080904@molden.no> References: <4AE1B7A2.9070603@student.matnat.uio.no> <4AE1C097.1080904@molden.no> Message-ID: <5b8d13220910231254q1cd039a3g2f284b5c4bb4d1a@mail.gmail.com> On Fri, Oct 23, 2009 at 11:41 PM, Sturla Molden wrote: > Except for automatic arrays, they do support some of the more important > parts of C99 as extensions to C89: > > inline functions > restrict qualifier > for (int i=0; i<; i++) No, it doesn't. The above only works in C++ mode, not in C mode. Visual Studio supports almost none of the useful C99 (VL array, complex number). The VS team has clearly stated that they don't care about updating C support. David From charlesr.harris at gmail.com Sat Oct 24 00:44:21 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 23 Oct 2009 22:44:21 -0600 Subject: [Numpy-discussion] Carriage returns in files. Message-ID: Hi All, I just fixed scipy ticket #1029, where the Sun Fortran compiler failed because nnls.f contained carriage returns (\r). Out of curiosity I decided to look as the numpy and scipy repositories to see how common \r was, with the results: numpy: 1232 instances scipy: 3315 instances Do we have a policy on this? IIRC, it is something that should be handled by subversion. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwf at cs.toronto.edu Sat Oct 24 01:24:23 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Sat, 24 Oct 2009 01:24:23 -0400 Subject: [Numpy-discussion] Carriage returns in files. In-Reply-To: References: Message-ID: On 24-Oct-09, at 12:44 AM, Charles R Harris wrote: > Do we have a policy on this? IIRC, it is something that should be > handled by subversion. AFAIK you're right, the exception might be that sometimes, if there are files with mixed newlines, subversion will get confused and leave them there. That should only happen if someone is using a seriously dumb editor. David From martin.teichmann at mbi-berlin.de Sat Oct 24 07:38:29 2009 From: martin.teichmann at mbi-berlin.de (Martin Teichmann) Date: Sat, 24 Oct 2009 13:38:29 +0200 Subject: [Numpy-discussion] User data types Message-ID: <16538b570910240438o32c638bbh2b03dfb21aacf37f@mail.gmail.com> Hello List, I'm working on an extension for pytables, a package to store numpy arrays into hdf5 files. hdf5 supports the additional datatype "reference", which makes sense only in an hdf5 context. In order to be able to use them in pytables, I figured the best idea is to define a user-defined datatype, register it with PyArray_RegisterDataType and insert it into the typeDict. This all works fine, except that spurious exceptions are risen after one has called dtype("r") (I use r as the type code). I tracked it down to the function PyArray_DescrConverter, which calls PyArray_DescrFromType (just after the finish: label). This function sets a python exception with PyErr_SetString and returns NULL. But the calling function ignores this, and returns successfully. The exception then is still dangling, and at a later time will be risen in completely unrelated code. I guess a PyErr_Clear should be added at the beginning of the if-clause to solve the problem. I submitted this bug as Ticket #1255 to the numpy trac system, where you can also find the code that triggered the bug. Greetings Martin Teichmann From ralf.gommers at googlemail.com Sat Oct 24 11:17:07 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 24 Oct 2009 17:17:07 +0200 Subject: [Numpy-discussion] distutils docstrings; request for review/help Message-ID: This request is mainly addressed to David Cournapeau I guess. I wrote docstrings for pretty much all the distutils items not marked "unimportant" in the doc wiki. Pretty much all the info I got from reading the code and comments in it, plus a little bit from reading the distutils.rst file and the Python distutils docs. This was not the easiest code to understand, so I would like to ask for a review and some help filling in the blanks. The main items that I could not finish are: - VariableSet.interpolate (if you could throw in an accurate description of exactly what it does, I can polish it up) - CCompiler_compile (could use more details I'm sure) - UnixCCompiler__compile (same) - UnixCCompiler_create_static_lib (same) Also, I left some comment about either things I was not sure about or things like unused parameters. You can find them here: http://docs.scipy.org/numpy/changes/ , at the top (made them today or yesterday). Could you please have a look at those? Finally, there are some items that could be important to document, but are marked as unimportant (Configuration class and methods, exec_command, ...). Would you mind looking through those items on http://docs.scipy.org/numpy/docs/ and change the status of the ones you think are important to "needs editing"? Then I'll try to finish those too. Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Sat Oct 24 13:39:47 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sat, 24 Oct 2009 10:39:47 -0700 Subject: [Numpy-discussion] distutils docstrings; request for review/help In-Reply-To: References: Message-ID: <45d1ab480910241039y7420a648s3163b95d78e0af75@mail.gmail.com> Thanks for "biting this bullet" Ralf. A couple comments (directed more to the "peanut gallery" than to Ralf): obviously, if anyone else besides David C. has the expertise, please help him, Ralf, and indeed all of us, on this; B) I looked at this a little while ago, and reference/distutils.rst just now, and IMO the main "large" thing that strikes me as lacking is an "integrative" example, i.e., a "sample package," (perhaps numpy itself?) described, in distutils.rst, perhaps, and used as a common example across attributes. FWIW, DG On Sat, Oct 24, 2009 at 8:17 AM, Ralf Gommers wrote: > This request is mainly addressed to David Cournapeau I guess. > > I wrote docstrings for pretty much all the distutils items not marked > "unimportant" in the doc wiki. Pretty much all the info I got from reading > the code and comments in it, plus a little bit from reading the > distutils.rst file and the Python distutils docs. This was not the easiest > code to understand, so I would like to ask for a review and some help > filling in the blanks. > > The main items that I could not finish are: > - VariableSet.interpolate (if you could throw in an accurate description of > exactly what it does, I can polish it up) > - CCompiler_compile (could use more details I'm sure) > - UnixCCompiler__compile (same) > - UnixCCompiler_create_static_lib (same) > > Also, I left some comment about either things I was not sure about or > things like unused parameters. You can find them here: > http://docs.scipy.org/numpy/changes/ , at the top (made them today or > yesterday). Could you please have a look at those? > > Finally, there are some items that could be important to document, but are > marked as unimportant (Configuration class and methods, exec_command, ...). > Would you mind looking through those items on > http://docs.scipy.org/numpy/docs/ and change the status of the ones you > think are important to "needs editing"? Then I'll try to finish those too. > > Thanks, > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cburns at berkeley.edu Sat Oct 24 18:14:17 2009 From: cburns at berkeley.edu (Christopher Burns) Date: Sat, 24 Oct 2009 15:14:17 -0700 Subject: [Numpy-discussion] parameter types for documentation Message-ID: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com> Are the appropriate parameter types for the docstrings, listed somewhere? In particular, in reviewing some docs I see both 'str' and 'string' used. Which one is correct? Chris From ralf.gommers at googlemail.com Sat Oct 24 18:19:04 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 25 Oct 2009 00:19:04 +0200 Subject: [Numpy-discussion] parameter types for documentation In-Reply-To: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com> References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com> Message-ID: On Sun, Oct 25, 2009 at 12:14 AM, Christopher Burns wrote: > Are the appropriate parameter types for the docstrings, listed > somewhere? In particular, in reviewing some docs I see both 'str' and > 'string' used. Which one is correct? > > Not all of them are listed in one place. For general advice, see the Parameters section of http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines One error on that page is {True, False}, that should be bool. str is correct, string is not. Reason: str is the type. Partial list: str bool list tuple sequence ndarray array_like or if you can be more precise: list of str sequence of ints and for keywords, add ", optional" Cheers, Ralf > Chris > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cburns at berkeley.edu Sat Oct 24 18:42:46 2009 From: cburns at berkeley.edu (Christopher Burns) Date: Sat, 24 Oct 2009 15:42:46 -0700 Subject: [Numpy-discussion] parameter types for documentation In-Reply-To: References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com> Message-ID: <764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com> Cool, thanks. Mind if I update the HOWTO_DOCUMENT adding in the partial list below? Chris On Sat, Oct 24, 2009 at 3:19 PM, Ralf Gommers wrote: > Not all of them are listed in one place. For general advice, see the > Parameters section of > http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines > One error on that page is {True, False}, that should be bool. > > str is correct, string is not. Reason: str is the type. > > Partial list: > str > bool > list > tuple > sequence > ndarray > array_like > > or if you can be more precise: > list of str > sequence of ints > > and for keywords, add ", optional" > From ralf.gommers at googlemail.com Sat Oct 24 18:52:40 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 25 Oct 2009 00:52:40 +0200 Subject: [Numpy-discussion] parameter types for documentation In-Reply-To: <764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com> References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com> <764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com> Message-ID: On Sun, Oct 25, 2009 at 12:42 AM, Christopher Burns wrote: > Cool, thanks. Mind if I update the HOWTO_DOCUMENT adding in the > partial list below? > > Sure, that would be useful. While you're at it, could you get rid of the {True, False}? Cheers, Ralf > Chris > > On Sat, Oct 24, 2009 at 3:19 PM, Ralf Gommers > wrote: > > Not all of them are listed in one place. For general advice, see the > > Parameters section of > > http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines > > One error on that page is {True, False}, that should be bool. > > > > str is correct, string is not. Reason: str is the type. > > > > Partial list: > > str > > bool > > list > > tuple > > sequence > > ndarray > > array_like > > > > or if you can be more precise: > > list of str > > sequence of ints > > > > and for keywords, add ", optional" > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cburns at berkeley.edu Sat Oct 24 19:11:55 2009 From: cburns at berkeley.edu (Christopher Burns) Date: Sat, 24 Oct 2009 16:11:55 -0700 Subject: [Numpy-discussion] parameter types for documentation In-Reply-To: References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com> <764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com> Message-ID: <764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com> Done. On Sat, Oct 24, 2009 at 3:52 PM, Ralf Gommers wrote: > Sure, that would be useful. While you're at it, could you get rid of the > {True, False}? From ralf.gommers at googlemail.com Sat Oct 24 19:17:22 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 25 Oct 2009 01:17:22 +0200 Subject: [Numpy-discussion] parameter types for documentation In-Reply-To: <764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com> References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com> <764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com> <764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com> Message-ID: On Sun, Oct 25, 2009 at 1:11 AM, Christopher Burns wrote: > Done. > > That section looks much better now. Except for the word "back-tics" :) Thanks, Ralf > On Sat, Oct 24, 2009 at 3:52 PM, Ralf Gommers > wrote: > > Sure, that would be useful. While you're at it, could you get rid of the > > {True, False}? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cburns at berkeley.edu Sat Oct 24 19:30:58 2009 From: cburns at berkeley.edu (Christopher Burns) Date: Sat, 24 Oct 2009 16:30:58 -0700 Subject: [Numpy-discussion] parameter types for documentation In-Reply-To: References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com> <764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com> <764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com> Message-ID: <764e38540910241630v65309535u1bcb746cc29f37fe@mail.gmail.com> Just committed a change to 'backticks'. ;) On Sat, Oct 24, 2009 at 4:17 PM, Ralf Gommers wrote: > That section looks much better now. Except for the word "back-tics" :) > > Thanks, > Ralf From d.l.goldsmith at gmail.com Sat Oct 24 22:19:39 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sat, 24 Oct 2009 19:19:39 -0700 Subject: [Numpy-discussion] parameter types for documentation In-Reply-To: <764e38540910241630v65309535u1bcb746cc29f37fe@mail.gmail.com> References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com> <764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com> <764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com> <764e38540910241630v65309535u1bcb746cc29f37fe@mail.gmail.com> Message-ID: <45d1ab480910241919h3024836cr5f6628c361141f20@mail.gmail.com> One other comment (sorry I'm late chiming in): in general, for something like "sequence of ints," usually what is really intended as viable input is "array-like of int-likes," and indeed, in the process of confirming this for various functions, I have found bugs where what was intended was in fact not supported. So, though it's more work, i.e., will take more time, the ideal scenario, IMO, when you're dealing w/ something like that, is to confirm that the function does indeed presently support the full gamut of viable inputs, note any strange behavior, post to the list if you're uncertain if it's a bug, or just file a bug ticket if you are sure. And in the past, when this has come up, I've been instructed to document the intended behavior, not the present buggy behavior (which just reinforces the need to file a bug report). DG On Sat, Oct 24, 2009 at 4:30 PM, Christopher Burns wrote: > Just committed a change to 'backticks'. > > ;) > > On Sat, Oct 24, 2009 at 4:17 PM, Ralf Gommers > wrote: > > That section looks much better now. Except for the word "back-tics" :) > > > > Thanks, > > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From opossumnano at gmail.com Sun Oct 25 04:38:37 2009 From: opossumnano at gmail.com (Tiziano Zito) Date: Sun, 25 Oct 2009 09:38:37 +0100 Subject: [Numpy-discussion] [ANN] Advanced Scientific Programming in Python Winter School in Warsaw, Poland Message-ID: Advanced Scientific Programming in Python a Winter School by the G-Node and University of Warsaw Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists actually use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques with theoretical lectures and practical exercises tailored to the needs of a programming scientist. New skills will be tested in a real programming project: we will team up to develop an entertaining scientific computer game. We'll use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. Clean language design and easy extensibility are driving Python to become a standard tool for scientific computing. Some of the most useful open source libraries for scientific computing and visualization will be presented. This winter school is targeted at Post-docs and PhD students from all areas. Substantial proficiency in Python or in another language (e.g. Java, C/C++, MATLAB, Mathematica) is absolutely required. An optional, one-day introduction to Python is offered to participants without prior experience with the language. Date and Location: February 8th ? 12th, 2010. Warsaw, Poland. Preliminary Program: - Day 0 (Mon Feb 8) ? [Optional] Dive into Python - Day 1 (Tue Feb 9) ? Software Carpentry ? Documenting code and using version control ? Test-driven development and unit testing ? Debugging, profiling and benchmarking techniques ? Object-oriented programming, design patterns, and agile programming - Day 2 (Wed Feb 10) ? Scientific Tools for Python ? NumPy, SciPy, Matplotlib ? Data serialization: from pickle to databases ? Programming project in the afternoon - Day 3 (Thu Feb 11) ? The Quest for Speed ? Writing parallel applications in Python ? When parallelization does not help: the starving CPUs problem ? Programming project in the afternoon - Day 4 (Fri Feb 12) ? Practical Software Development ? Software design ? Efficient programming in teams ? Quality Assurance ? Programming project final Applications: Applications should be sent before December 6th, 2009 to: python-winterschool at g-node.org No fee is charged but participants should take care of travel, living, and accommodation expenses. Applications should include full contact information (name, affiliation, email & phone), a *short* CV and a *short* statement addressing the following questions: ? What is your educational background? ? What experience do you have in programming? ? Why do you think ?Advanced Scientific Programming in Python? is an appropriate course for your skill profile? Candidates will be selected on the basis of their profile. Places are limited: early application is recommended. Notifications of acceptance will be sent by December 14th, 2009. Faculty ? Francesc Alted, author of PyTables, Castell? de la Plana, Spain [Day 3] ? Pietro Berkes, Volen Center for Complex Systems, Brandeis University, USA [Day 1] ? Zbigniew J?drzejewski-Szmek, Institute of Experimental Physics, University of Warsaw, Poland [Day 0] ? Eilif Muller, Laboratory of Computational Neuroscience, Ecole Polytechnique F?d?rale de Lausanne, Switzerland [Day 3] ? Bartosz Tele?czuk, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin, Germany [Day 2] ? Niko Wilbert, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin, Germany [Day 1] ? Tiziano Zito, Bernstein Center for Computational Neuroscience, Berlin, Germany [Day 4] Organized by Piotr Durka, Joanna and Zbigniew J?drzejewscy-Szmek (Institute of Experimental Physics, University of Warsaw), and Tiziano Zito (German Neuroinformatics Node of the INCF). Website: http://www.g-node.org/python-winterschool Contact: python-winterschool at g-node.org From ralf.gommers at googlemail.com Sun Oct 25 18:21:00 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 25 Oct 2009 23:21:00 +0100 Subject: [Numpy-discussion] fftpack_lite question Message-ID: Hi all, Can anyone tell me if fftpack_lite is an exact C translation of the fftpack Fortran code? Or at least close enough that the signature, parameter descriptions and algorithm are the same? If so, I can use the fftpack Fortran sources (which have useful comments) to write docs for fftpack_lite funcs (rfft* and cfft*). Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Oct 25 18:51:24 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 25 Oct 2009 16:51:24 -0600 Subject: [Numpy-discussion] fftpack_lite question In-Reply-To: References: Message-ID: On Sun, Oct 25, 2009 at 4:21 PM, Ralf Gommers wrote: > Hi all, > > Can anyone tell me if fftpack_lite is an exact C translation of the fftpack > Fortran code? Or at least close enough that the signature, parameter > descriptions and algorithm are the same? > > If so, I can use the fftpack Fortran sources (which have useful comments) > to write docs for fftpack_lite funcs (rfft* and cfft*). > > fft_pack is an interface to a c translation of fftpack. IIRC, it adds some stuff like zerofill and such so it isn't a1-1 matchup. I think it is pretty close, though. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Oct 25 19:04:29 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 26 Oct 2009 00:04:29 +0100 Subject: [Numpy-discussion] fftpack_lite question In-Reply-To: References: Message-ID: On Sun, Oct 25, 2009 at 11:51 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Sun, Oct 25, 2009 at 4:21 PM, Ralf Gommers > wrote: > >> Hi all, >> >> Can anyone tell me if fftpack_lite is an exact C translation of the >> fftpack Fortran code? Or at least close enough that the signature, parameter >> descriptions and algorithm are the same? >> >> If so, I can use the fftpack Fortran sources (which have useful comments) >> to write docs for fftpack_lite funcs (rfft* and cfft*). >> >> > fft_pack is an interface to a c translation of fftpack. IIRC, it adds some > stuff like zerofill and such so it isn't a1-1 matchup. I think it is pretty > close, though. > Okay, thanks. I'll start with the Fortran docs then, and someone familiar with the differences could then easily throw in a few notes on that. Ralf Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cburns at berkeley.edu Sun Oct 25 20:42:04 2009 From: cburns at berkeley.edu (Christopher Burns) Date: Sun, 25 Oct 2009 17:42:04 -0700 Subject: [Numpy-discussion] Procedure for doing documentation reviews Message-ID: <764e38540910251742x7481a625m74faa4128dfcbae0@mail.gmail.com> When a documents status is "Needs Review" and when reviewing it we feel it needs edits, should we add comments regarding the edits, or should we feel free to edit it directly? Chris From d.l.goldsmith at gmail.com Sun Oct 25 21:16:44 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 25 Oct 2009 18:16:44 -0700 Subject: [Numpy-discussion] Procedure for doing documentation reviews In-Reply-To: <764e38540910251742x7481a625m74faa4128dfcbae0@mail.gmail.com> References: <764e38540910251742x7481a625m74faa4128dfcbae0@mail.gmail.com> Message-ID: <45d1ab480910251816i12fbe11cxd35a3457187ca330@mail.gmail.com> Technically, after "Needs Review," it's supposed to go through "Needs Work (Reviewed)" The "by the book" way to do it would be to: 0 & 1) Provide comments in the Discussion section and change status to "Needs Work (Reviewed)" (in either order); 2) Edit, if inclined. 3) Answer your own comments as a record of what you've done; 4) At the end of all this, if you feel it's ready for review again, change status to "Needs Review (Revised)." Thanks! DG On Sun, Oct 25, 2009 at 5:42 PM, Christopher Burns wrote: > When a documents status is "Needs Review" and when reviewing it we > feel it needs edits, should we add comments regarding the edits, or > should we feel free to edit it directly? > > Chris > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bpederse at gmail.com Sun Oct 25 22:16:06 2009 From: bpederse at gmail.com (Brent Pedersen) Date: Sun, 25 Oct 2009 19:16:06 -0700 Subject: [Numpy-discussion] documenting optional out parameter Message-ID: hi, i've seen this section: http://docs.scipy.org/numpy/Questions+Answers/#the-out-argument should _all_ functions with an optional out parameter have exactly that text? so if i find a docstring with reasonable, but different doc for out, should it be changed to that? and if a docstring of a function with an optional out that needs review does not have the out parameter documented should it be marked as 'Needs Work'? thanks, -brentp From scott.sinclair.za at gmail.com Mon Oct 26 01:51:40 2009 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 26 Oct 2009 07:51:40 +0200 Subject: [Numpy-discussion] documenting optional out parameter In-Reply-To: References: Message-ID: <6a17e9ee0910252251m7e21c6adr9003fa2285e8c94d@mail.gmail.com> > 2009/10/26 Brent Pedersen : > hi, i've seen this section: > http://docs.scipy.org/numpy/Questions+Answers/#the-out-argument > > should _all_ functions with an optional out parameter have exactly that text? > so if i find a docstring with reasonable, but different doc for out, > should it be changed > to that? The Q&A doesn't seem to have reached a firm conclusion, so I'd suggest that any correct and reasonable documentation of the out parameter is fine. > and if a docstring of a function with an optional out that needs > review does not have > the out parameter documented should it be marked as 'Needs Work'? I'd say yes, since the docstring is incomplete in this case. Cheers, Scott From nwagner at iam.uni-stuttgart.de Mon Oct 26 04:04:16 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 26 Oct 2009 09:04:16 +0100 Subject: [Numpy-discussion] Multiplicity of an entry Message-ID: Hi all, how can I obtain the multiplicity of an entry in a list a = ['abc','def','abc','ghij'] The multiplicity of 'abc' is 2. 'def' is 1. 'ghij' is 1. Nils From ralf.gommers at googlemail.com Mon Oct 26 04:55:05 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 26 Oct 2009 09:55:05 +0100 Subject: [Numpy-discussion] Procedure for doing documentation reviews In-Reply-To: <45d1ab480910251816i12fbe11cxd35a3457187ca330@mail.gmail.com> References: <764e38540910251742x7481a625m74faa4128dfcbae0@mail.gmail.com> <45d1ab480910251816i12fbe11cxd35a3457187ca330@mail.gmail.com> Message-ID: On Mon, Oct 26, 2009 at 2:16 AM, David Goldsmith wrote: > Technically, after "Needs Review," it's supposed to go through "Needs Work > (Reviewed)" The "by the book" way to do it would be to: > > 0 & 1) Provide comments in the Discussion section and change status to > "Needs Work (Reviewed)" (in either order); > > 2) Edit, if inclined. > > 3) Answer your own comments as a record of what you've done; > > 4) At the end of all this, if you feel it's ready for review again, change > status to "Needs Review (Revised)." > > Thanks! > > DG > > > On Sun, Oct 25, 2009 at 5:42 PM, Christopher Burns wrote: > >> When a documents status is "Needs Review" and when reviewing it we >> feel it needs edits, should we add comments regarding the edits, or >> should we feel free to edit it directly? >> >> If they are largish changes, then what David said. If they are minor changes though, just edit away (I do that all the time). For example, if you see some mistakes in type descriptions, just fix them. If you feel a whole section is unclear and needs a rewrite, follow the review procedure. In between, exercise your good judgment. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Mon Oct 26 08:15:51 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 26 Oct 2009 08:15:51 -0400 Subject: [Numpy-discussion] 500 internal server error from docs.scipy.org Message-ID: This link: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.var.html#scipy.stats.var gives 500 internal server error From aisaac at american.edu Mon Oct 26 08:25:22 2009 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 26 Oct 2009 08:25:22 -0400 Subject: [Numpy-discussion] Multiplicity of an entry In-Reply-To: References: Message-ID: <4AE59532.3000708@american.edu> On 10/26/2009 4:04 AM, Nils Wagner wrote: > how can I obtain the multiplicity of an entry in a list > a = ['abc','def','abc','ghij'] That's a Python question, not a NumPy question. So comp.lang.python would be a better forum. But here's a simplest solution:: a = ['abc','def','abc','ghij'] for item in set(a): print item, a.count(item) This is horribly inefficient of course. If you have a big list, if would be *much* better to use defaultdict: from collections import defaultdict myct = defaultdict(int) for item in a: myct[item] += 1 print myct.items() fwiw, Alan Isaac From pav+sp at iki.fi Mon Oct 26 08:29:11 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Mon, 26 Oct 2009 12:29:11 +0000 (UTC) Subject: [Numpy-discussion] 500 internal server error from docs.scipy.org References: Message-ID: Mon, 26 Oct 2009 08:15:51 -0400, Neal Becker wrote: > This link: > > http://docs.scipy.org/doc/scipy/reference/generated/ scipy.stats.var.html#scipy.stats.var > > gives 500 internal server error Now that's strange. It's a static page. -- Pauli Virtanen From ralf.gommers at googlemail.com Mon Oct 26 09:33:57 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 26 Oct 2009 14:33:57 +0100 Subject: [Numpy-discussion] fftpack_lite question In-Reply-To: References: Message-ID: On Mon, Oct 26, 2009 at 12:04 AM, Ralf Gommers wrote: > > > On Sun, Oct 25, 2009 at 11:51 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sun, Oct 25, 2009 at 4:21 PM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> Hi all, >>> >>> Can anyone tell me if fftpack_lite is an exact C translation of the >>> fftpack Fortran code? Or at least close enough that the signature, parameter >>> descriptions and algorithm are the same? >>> >>> If so, I can use the fftpack Fortran sources (which have useful comments) >>> to write docs for fftpack_lite funcs (rfft* and cfft*). >>> >>> >> fft_pack is an interface to a c translation of fftpack. IIRC, it adds some >> stuff like zerofill and such so it isn't a1-1 matchup. I think it is pretty >> close, though. >> > > Okay, thanks. I'll start with the Fortran docs then, and someone familiar > with the differences could then easily throw in a few notes on that. > > There are docs now for all six exposed functions (cfft*, rfft*): http://docs.scipy.org/numpy/docs/numpy.fft.fftpack_lite/ If anyone with knowledge of the differences between the C and Fortran versions could add a few notes at the above link, that would be great. Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebressert at cfa.harvard.edu Mon Oct 26 10:54:37 2009 From: ebressert at cfa.harvard.edu (Eli Bressert) Date: Mon, 26 Oct 2009 14:54:37 +0000 Subject: [Numpy-discussion] Astype and strings Message-ID: Hi Everyone, Is Numpy supposed to behave this like this when converting an array of numbers to an array of strings with astype? print(arange(20).astype(np.str)) ['0' '1' '2' '3' '4' '5' '6' '7' '8' '9' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1'] When I do the following it works fine, print(arange(20).astype('|S2')) ['0' '1' '2' '3' '4' '5' '6' '7' '8' '9' '10' '11' '12' '13' '14' '15' '16' '17' '18' '19'] I would have thought that astype would be more intelligent with strings rather than just resorting to the first character for each element. Is this a bug or or is it how astype works? Thanks, Eli From sturla at molden.no Mon Oct 26 12:24:56 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 26 Oct 2009 17:24:56 +0100 Subject: [Numpy-discussion] fftpack_lite question In-Reply-To: References: Message-ID: <4AE5CD58.7080902@molden.no> Ralf Gommers skrev: > > If anyone with knowledge of the differences between the C and Fortran > versions could add a few notes at the above link, that would be great. > The most notable difference (from a user perspective) is that the Fortran version has more transforms, such as discrete sine and cosine transforms. It also supports single and double precision. The older Fortran version is used in SciPy. FFTs from FFTW and MKL tend to be faster than FFTPACK, at least on Intel hardware. FFTPACK was originally written for running fast on vector machines like the Cray and NEC. FFTPACK-lite: http://projects.scipy.org/numpy/browser/trunk/scipy/basic/fftpack_lite?rev=1676 Older Fortran version: http://www.netlib.org/fftpack/ Fortran 90 version (no license): http://orion.math.iastate.edu/burkardt/f_src/fftpack/fftpack.html Another C version: http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=fftpack/fft.c S.M. From ralf.gommers at googlemail.com Mon Oct 26 13:43:48 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 26 Oct 2009 18:43:48 +0100 Subject: [Numpy-discussion] fftpack_lite question In-Reply-To: <4AE5CD58.7080902@molden.no> References: <4AE5CD58.7080902@molden.no> Message-ID: Hi Sturla, Thanks for the overview. On Mon, Oct 26, 2009 at 5:24 PM, Sturla Molden wrote: > Ralf Gommers skrev: > > > > If anyone with knowledge of the differences between the C and Fortran > > versions could add a few notes at the above link, that would be great. > > > The most notable difference (from a user perspective) is that the > Fortran version has more transforms, such as discrete sine and cosine > transforms. It also supports single and double precision. The older > Fortran version is used in SciPy. > I added this to the module docstring. The info that would still be useful is how the functions that are exposed in fftpack_lite are subtly different from the older Fortran functions. Charles mentioned zerofill for example. Those funcs are: cfftb cfftf cffti rfftb rfftf rffti Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Mon Oct 26 14:12:49 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 26 Oct 2009 11:12:49 -0700 Subject: [Numpy-discussion] Multiplicity of an entry In-Reply-To: <4AE59532.3000708@american.edu> References: <4AE59532.3000708@american.edu> Message-ID: <4AE5E6A1.4030901@noaa.gov> Alan G Isaac wrote: > On 10/26/2009 4:04 AM, Nils Wagner wrote: >> how can I obtain the multiplicity of an entry in a list >> a = ['abc','def','abc','ghij'] > > That's a Python question, not a NumPy question. but we can make it a numpy question! In [15]: a = np.array(['abc','def','abc','ghij']) In [16]: a Out[16]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4') In [17]: for item in set(a): print item, (a == item).sum() abc 2 ghij 1 def 1 I'll leave pro=filing to the OP. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From josef.pktd at gmail.com Mon Oct 26 14:26:12 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 26 Oct 2009 14:26:12 -0400 Subject: [Numpy-discussion] Multiplicity of an entry In-Reply-To: <4AE5E6A1.4030901@noaa.gov> References: <4AE59532.3000708@american.edu> <4AE5E6A1.4030901@noaa.gov> Message-ID: <1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com> On Mon, Oct 26, 2009 at 2:12 PM, Christopher Barker wrote: > Alan G Isaac wrote: >> On 10/26/2009 4:04 AM, Nils Wagner wrote: >>> how can I obtain the multiplicity of an entry in a list >>> a = ['abc','def','abc','ghij'] >> >> That's a Python question, not a NumPy question. > > but we can make it a numpy question! > > In [15]: a = np.array(['abc','def','abc','ghij']) > > > In [16]: a > Out[16]: > array(['abc', 'def', 'abc', 'ghij'], > ? ? ? dtype='|S4') > > In [17]: for item in set(a): > ? ? print item, (a == item).sum() It's *very* slow, when there are a large number of items. numpy creates the full boolean array for each item. see also http://projects.scipy.org/scipy/ticket/905 Josef > > abc 2 > ghij 1 > def 1 > > I'll leave pro=filing to the OP. > > -Chris > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mdroe at stsci.edu Mon Oct 26 14:26:20 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Mon, 26 Oct 2009 14:26:20 -0400 Subject: [Numpy-discussion] C code coverage tool Message-ID: <4AE5E9CC.3070808@stsci.edu> I know David Cournapeau has done some work on using gcov for coverage with Numpy. Unaware of this, (doh! -- I should have Googled first), I wrote a small C code-coverage tool built on top of valgrind's callgrind tool, so it basically only works on x86/AMD64 unixy platforms, but unlike gcov it doesn't require any recompilation headaches (though compiling unoptimized helps). It's about 200 lines of Python code that parses callgrind's output and generates text and HTML. It has specialized support for the code generation used in Numpy -- each line is marked not only by *if* it ran, but in which version of the function it ran. I've put an example of its output from the numpy unit tests up here (temporary address): http://www.droettboom.com/c_coverage/ A particularly interesting file is this one: http://www.droettboom.com/c_coverage/numpy_core_src_multiarray_arraytypes.c.src.html Is this something we want to add to the SVN tree, maybe under tools? Usage instructions are below. Mike =============== C coverage tool =============== This is a tool to generate C code-coverage reports using valgrind's callgrind tool. Prerequisites ------------- * `Valgrind `_ (3.5.0 tested, earlier versions may work) * `Pygments `_ (0.11 or later required) C code-coverage --------------- Generating C code coverage reports requires two steps: * Collecting coverage results (from valgrind) * Generating a report from one or more sets of results For most cases, it is good enough to do:: > c_coverage_collect.sh python -c "import numpy; numpy.test()" > c_coverage_report.py callgrind.out.pid which will run all of the Numpy unit tests, create a directory called `coverage` and write the coverage report there. In a more advanced scenario, you may wish to run individual unit tests (since running under valgrind slows things down) and combine multiple results files together in a single report. Collecting results `````````````````` To collect coverage results, you merely run the python interpreter under valgrind's callgrind tool. The `c_coverage_collect.sh` helper script will pass all of the required arguments to valgrind. For example, in typical usage, you may want to run all of the Numpy unit tests:: > c_coverage_collect.sh python -c "import numpy; numpy.test()" This will output a file ``callgrind.out.pid`` containing the results of the run, where ``pid`` is the process id of the run. Generating a report ``````````````````` To generate a report, you pass the ``callgrind.out.pid`` output file to the `c_coverage_report.py` script:: > c_coverage_report.py callgrind.out.pid To combine multiple results files together, simply list them on the commandline or use wildcards:: > c_coverage_report.py callgrind.out.* Options ''''''' * ``--directory``: Specify a different output directory * ``--pattern``: Specify a regex pattern to match for source files. The default is `numpy`, so it will only include source files whose path contains the string `numpy`. If, for instance, you wanted to include all source files covered (that are available on your system), pass ``--pattern=.``. * ``--format``: Specify the output format(s) to generate. May be either ``text`` or ``html``. If ``--format`` is not provided, both formats will be output. Reading a report ---------------- The C code coverage report is a flat directory of files, containing text and/or html files. The files are named based on their path in the original source tree with slashes converted to underscores. Text reports ```````````` The text reports add a prefix to each line of source code: - '>' indicates the line of code was run - '!' indicates the line of code was not run HTML reports ```````````` The HTML report highlights the code that was run in green. The HTML report has special support for the "generated" functions in Numpy. Each run line of code also contains a number in square brackets indicating the number of different generated functions the line was run in. Hovering the mouse over the line will display a list of the versions of the function in which the line was run. These numbers can be used to see if a particular line was run in all versions of the function. Caveats ------- The coverage results occasionally misses lines that clearly must have been run. This usually can be traced back to the compiler optimizer removing lines because they are tautologically impossible or to combine lines together. Compiling Numpy without optimizations helps, but not completely. Even despite this flaw, this tool is still helpful in identifying large missed blocks or functions. -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From nadavh at visionsense.com Mon Oct 26 22:27:13 2009 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 27 Oct 2009 04:27:13 +0200 Subject: [Numpy-discussion] Multiplicity of an entry References: <4AE59532.3000708@american.edu> <4AE5E6A1.4030901@noaa.gov> <1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com> Message-ID: <710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il> In principle you could use: np.equal(a,a).sum(0) but, for unknown reason, np.equal operates only on "normal" arrays. maybe you can transform the array to arrays of numbers, for example by hash. Nadav -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? josef.pktd at gmail.com ????: ? 26-???????-09 20:26 ??: Discussion of Numerical Python ????: Re: [Numpy-discussion] Multiplicity of an entry On Mon, Oct 26, 2009 at 2:12 PM, Christopher Barker wrote: > Alan G Isaac wrote: >> On 10/26/2009 4:04 AM, Nils Wagner wrote: >>> how can I obtain the multiplicity of an entry in a list >>> a = ['abc','def','abc','ghij'] >> >> That's a Python question, not a NumPy question. > > but we can make it a numpy question! > > In [15]: a = np.array(['abc','def','abc','ghij']) > > > In [16]: a > Out[16]: > array(['abc', 'def', 'abc', 'ghij'], > ? ? ? dtype='|S4') > > In [17]: for item in set(a): > ? ? print item, (a == item).sum() It's *very* slow, when there are a large number of items. numpy creates the full boolean array for each item. see also http://projects.scipy.org/scipy/ticket/905 Josef > > abc 2 > ghij 1 > def 1 > > I'll leave pro=filing to the OP. > > -Chris > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3776 bytes Desc: not available URL: From pav+sp at iki.fi Tue Oct 27 05:11:29 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Tue, 27 Oct 2009 09:11:29 +0000 (UTC) Subject: [Numpy-discussion] C code coverage tool References: <4AE5E9CC.3070808@stsci.edu> Message-ID: Mon, 26 Oct 2009 14:26:20 -0400, Michael Droettboom wrote: > I know David Cournapeau has done some work on using gcov for coverage > with Numpy. > > Unaware of this, (doh! -- I should have Googled first), I wrote a small > C code-coverage tool built on top of valgrind's callgrind tool, so it > basically only works on x86/AMD64 unixy platforms, but unlike gcov it > doesn't require any recompilation headaches (though compiling > unoptimized helps). [clip] Where's the code? [clip] > Is this something we want to add to the SVN tree, maybe under tools? Yes. Also, maybe you want to send it to the Valgrind guys, too. If they don't yet have a code coverage functionality yet, it could be nice to have. Pauli From mdroe at stsci.edu Tue Oct 27 07:33:54 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Tue, 27 Oct 2009 07:33:54 -0400 Subject: [Numpy-discussion] C code coverage tool In-Reply-To: References: <4AE5E9CC.3070808@stsci.edu> Message-ID: <4AE6DAA2.3060804@stsci.edu> On 10/27/2009 05:11 AM, Pauli Virtanen wrote: > Mon, 26 Oct 2009 14:26:20 -0400, Michael Droettboom wrote: > >> I know David Cournapeau has done some work on using gcov for coverage >> with Numpy. >> >> Unaware of this, (doh! -- I should have Googled first), I wrote a small >> C code-coverage tool built on top of valgrind's callgrind tool, so it >> basically only works on x86/AMD64 unixy platforms, but unlike gcov it >> doesn't require any recompilation headaches (though compiling >> unoptimized helps). >> > [clip] > > Where's the code? > It's in the Numpy SVN tree now, under tools/c_coverage > [clip] > >> Is this something we want to add to the SVN tree, maybe under tools? >> > Yes. Also, maybe you want to send it to the Valgrind guys, too. If they > don't yet have a code coverage functionality yet, it could be nice to > have. > There has been a coverage-only valgrind tool in the works for almost two years (vcov). That's a lot more work that what I've done here (by reusing callgrind for the purpose), but it will apparently be more performant. Personally, I couldn't get it to compile (I think it's out of sync with the rest of valgrind atm). I didn't want to wait for that, and I don't know enough about valgrind internals to effectively contribute, so I just wrote a callgrind parser (really not that hard). I found another project [1] that takes a similar approach, but it's written in C++ and looked too difficult to adapt to handle Numpy's code generation. [1] http://github.com/icefox/callgrind_tools Mike From gokhansever at gmail.com Tue Oct 27 07:56:33 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 27 Oct 2009 06:56:33 -0500 Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays Message-ID: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com> Hello, Consider this sample two columns of data: 999999.9999 999999.9999 999999.9999 999999.9999 999999.9999 999999.9999 999999.9999 1693.9069 999999.9999 1676.1059 999999.9999 1621.5875 651.8040 1542.1373 691.0138 1650.4214 678.5558 1710.7311 621.5777 999999.9999 644.8341 999999.9999 696.2080 999999.9999 Putting into this data into a file say "sample.data" and loading with: a,b = np.loadtxt('sample.data', dtype="float").T I[16]: a O[16]: array([ 1.00000000e+06, 1.00000000e+06, 1.00000000e+06, 1.00000000e+06, 1.00000000e+06, 1.00000000e+06, 6.51804000e+02, 6.91013800e+02, 6.78555800e+02, 6.21577700e+02, 6.44834100e+02, 6.96208000e+02]) I[17]: b O[17]: array([ 999999.9999, 999999.9999, 999999.9999, 1693.9069, 1676.1059, 1621.5875, 1542.1373, 1650.4214, 1710.7311, 999999.9999, 999999.9999, 999999.9999]) ### interestingly, the second column is loaded as it is but a values reformed a little. Why this could be happening? Any idea? Anyways, back to masked arrays: I[24]: am = ma.masked_values(a, value=999999.9999) I[25]: am O[25]: masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 644.8341 696.208], mask = [ True True True True True True False False False False False False], fill_value = 999999.9999) I[30]: bm = ma.masked_values(b, value=999999.9999) I[31]: am O[31]: masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 644.8341 696.208], mask = [ True True True True True True False False False False False False], fill_value = 999999.9999) So far so good. A few basic checks: I[33]: am/bm O[33]: masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712 0.39664667346 -- -- --], mask = [ True True True True True True False False False True True True], fill_value = 999999.9999) I[34]: mean(am/bm) O[34]: 0.41266624676580849 Unfortunately, matplotlib.mlab's prctile cannot handle this division: I[54]: prctile(am/bm, p=[5,25,50,75,95]) O[54]: array([ 3.96646673e-01, 6.21577700e+02, 1.00000000e+06, 1.00000000e+06, 1.00000000e+06]) This also results with wrong looking box-and-whisker plots. Testing further with scipy.stats functions yields expected correct results: I[55]: stats.scoreatpercentile(am/bm, per=5) O[55]: 0.40877012449846228 I[49]: stats.scoreatpercentile(am/bm, per=25) O[49]: masked_array(data = --, mask = True, fill_value = 1e+20) I[56]: stats.scoreatpercentile(am/bm, per=95) O[56]: masked_array(data = --, mask = True, fill_value = 1e+20) Any confirmation? -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.raspaud at smhi.se Tue Oct 27 08:43:54 2009 From: martin.raspaud at smhi.se (Raspaud Martin) Date: Tue, 27 Oct 2009 13:43:54 +0100 Subject: [Numpy-discussion] C-API: How is data filling done in PyArray_SimpleNewFromData ? Message-ID: <783F32138ED65D4A9CF016980481B6BF01CB5785@CORRE.ad.smhi.se> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, I?m using numpy v1.2.0, and I have the following codes that provide different results : - --------------------- cal = (PyArrayObject *)PyArray_SimpleNew(2,dims,NPY_FLOAT); for(i=0;i -------------- next part -------------- A non-text attachment was scrubbed... Name: martin_raspaud.vcf Type: text/x-vcard Size: 260 bytes Desc: martin_raspaud.vcf URL: From josef.pktd at gmail.com Tue Oct 27 09:25:21 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 27 Oct 2009 09:25:21 -0400 Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays In-Reply-To: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com> References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com> Message-ID: <1cd32cbb0910270625p72fdd11fj8f18db42aa9bb566@mail.gmail.com> On Tue, Oct 27, 2009 at 7:56 AM, G?khan Sever wrote: > Hello, > > Consider this sample two columns of data: > > ?999999.9999 999999.9999 > ?999999.9999 999999.9999 > ?999999.9999 999999.9999 > ?999999.9999?? 1693.9069 > ?999999.9999?? 1676.1059 > ?999999.9999?? 1621.5875 > ??? 651.8040?????? 1542.1373 > ??? 691.0138?????? 1650.4214 > ??? 678.5558?????? 1710.7311 > ??? 621.5777??? 999999.9999 > ??? 644.8341??? 999999.9999 > ??? 696.2080??? 999999.9999 > > Putting into this data into a file say "sample.data" and loading with: > > a,b = np.loadtxt('sample.data', dtype="float").T > > I[16]: a > O[16]: > array([? 1.00000000e+06,?? 1.00000000e+06,?? 1.00000000e+06, > ???????? 1.00000000e+06,?? 1.00000000e+06,?? 1.00000000e+06, > ???????? 6.51804000e+02,?? 6.91013800e+02,?? 6.78555800e+02, > ???????? 6.21577700e+02,?? 6.44834100e+02,?? 6.96208000e+02]) > > I[17]: b > O[17]: > array([ 999999.9999,? 999999.9999,? 999999.9999,??? 1693.9069, > ????????? 1676.1059,??? 1621.5875,??? 1542.1373,??? 1650.4214, > ????????? 1710.7311,? 999999.9999,? 999999.9999,? 999999.9999]) > > ### interestingly, the second column is loaded as it is but a values > reformed a little. Why this could be happening? Any idea? Anyways, back to > masked arrays: > > I[24]: am = ma.masked_values(a, value=999999.9999) > > I[25]: am > O[25]: > masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 > 644.8341 696.208], > ???????????? mask = [ True? True? True? True? True? True False False False > False False False], > ?????? fill_value = 999999.9999) > > > I[30]: bm = ma.masked_values(b, value=999999.9999) > > I[31]: am > O[31]: > masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 > 644.8341 696.208], > ???????????? mask = [ True? True? True? True? True? True False False False > False False False], > ?????? fill_value = 999999.9999) > > > So far so good. A few basic checks: > > I[33]: am/bm > O[33]: > masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712 > 0.39664667346 -- -- --], > ???????????? mask = [ True? True? True? True? True? True False False False > True? True? True], > ?????? fill_value = 999999.9999) > > > I[34]: mean(am/bm) > O[34]: 0.41266624676580849 > > Unfortunately, matplotlib.mlab's prctile cannot handle this division: > > I[54]: prctile(am/bm, p=[5,25,50,75,95]) > O[54]: > array([? 3.96646673e-01,?? 6.21577700e+02,?? 1.00000000e+06, > ???????? 1.00000000e+06,?? 1.00000000e+06]) > > > This also results with wrong looking box-and-whisker plots. > > > Testing further with scipy.stats functions yields expected correct results: This should not be the correct results if you use scipy.stats.scoreatpercentile, it doesn't have correct missing value handling, it treats nans or mask/fill values as regular numbers sorted to the end. stats.mstats.scoreatpercentile is the corresponding function for masked arrays. (BTW I wasn't able to quickly copy and past your example because MaskedArrays don't seem to have a constructive __repr__, i.e. no commas) I don't know anything about the matplotlib story. Josef > > I[55]: stats.scoreatpercentile(am/bm, per=5) > O[55]: 0.40877012449846228 > > I[49]: stats.scoreatpercentile(am/bm, per=25) > O[49]: > masked_array(data = --, > ???????????? mask = True, > ?????? fill_value = 1e+20) > > I[56]: stats.scoreatpercentile(am/bm, per=95) > O[56]: > masked_array(data = --, > ???????????? mask = True, > ?????? fill_value = 1e+20) > > > Any confirmation? > > > > > > > > -- > G?khan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From Chris.Barker at noaa.gov Tue Oct 27 12:09:53 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 27 Oct 2009 09:09:53 -0700 Subject: [Numpy-discussion] Multiplicity of an entry In-Reply-To: <710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il> References: <4AE59532.3000708@american.edu> <4AE5E6A1.4030901@noaa.gov> <1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com> <710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il> Message-ID: <4AE71B51.8040002@noaa.gov> Nadav Horesh wrote: > np.equal(a,a).sum(0) > > but, for unknown reason, np.equal operates only on "normal" arrays. true: In [25]: a Out[25]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4') In [27]: np.equal(a,a) Out[27]: NotImplemented however: In [28]: a == a Out[28]: array([ True, True, True, True], dtype=bool) don't they use the same code? or is "==" reverting to plain old generic python sequence comparison, which would partly explain why it is so slow. > maybe you can transform the array to arrays of numbers, for example by hash. or even easier: In [32]: a2 = a.view(dtype=np.int32) In [33]: a2 Out[33]: array([1633837824, 1684366848, 1633837824, 1734895978]) In [34]: np.equal(a2, a2[0]) Out[34]: array([ True, False, True, False], dtype=bool) though that only works if your strings are a handy length like 4 bytes... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Tue Oct 27 13:23:49 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 27 Oct 2009 13:23:49 -0400 Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays In-Reply-To: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com> References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com> Message-ID: <08B7694F-30B0-4303-83B5-3687BD2A3777@gmail.com> On Oct 27, 2009, at 7:56 AM, G?khan Sever wrote: > > > Unfortunately, matplotlib.mlab's prctile cannot handle this division: Actually, the division's OK, it's mlab.prctile which is borked. It uses the length of the input array instead of its count to compute the nb of valid data. The easiest workaround in your case is probably to use: >>> prctile((am/bm).compressed(), p=[5,25,50,75,95]) HIH P. From mdroe at stsci.edu Tue Oct 27 15:31:39 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Tue, 27 Oct 2009 15:31:39 -0400 Subject: [Numpy-discussion] Multiplicity of an entry In-Reply-To: <4AE71B51.8040002@noaa.gov> References: <4AE59532.3000708@american.edu> <4AE5E6A1.4030901@noaa.gov> <1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com> <710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il> <4AE71B51.8040002@noaa.gov> Message-ID: <4AE74A9B.4060603@stsci.edu> Christopher Barker wrote: > Nadav Horesh wrote: > >> np.equal(a,a).sum(0) >> >> but, for unknown reason, np.equal operates only on "normal" arrays. >> > > true: > > In [25]: a > Out[25]: > array(['abc', 'def', 'abc', 'ghij'], > dtype='|S4') > > In [27]: np.equal(a,a) > Out[27]: NotImplemented > > however: > > In [28]: a == a > Out[28]: array([ True, True, True, True], dtype=bool) > > don't they use the same code? or is "==" reverting to plain old generic > python sequence comparison, which would partly explain why it is so slow. > It looks as if "a == a" (that is array_richcompare) is triggering special case code for strings, so it is fast. However, IMHO np.equal should be made to work as well. Can you file a bug and assign it to me (I'm dealing with a number of other string-related things, so I might as well take this too). Mike -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From oliphant at enthought.com Tue Oct 27 15:54:53 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 27 Oct 2009 14:54:53 -0500 Subject: [Numpy-discussion] C-API: How is data filling done in PyArray_SimpleNewFromData ? In-Reply-To: <783F32138ED65D4A9CF016980481B6BF01CB5785@CORRE.ad.smhi.se> References: <783F32138ED65D4A9CF016980481B6BF01CB5785@CORRE.ad.smhi.se> Message-ID: On Oct 27, 2009, at 7:43 AM, Raspaud Martin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > > I?m using numpy v1.2.0, and I have the following codes that provide > different results : > > - --------------------- > cal = (PyArrayObject *)PyArray_SimpleNew(2,dims,NPY_FLOAT); > for(i=0;i for(j=0;j { > *((npy_float *)PyArray_GETPTR2(cal,i,j))=(npy_float)in[i][j]; > } > - --------------------- > and > - --------------------- > cal = (PyArrayObject *)PyArray_SimpleNewFromData(2,dims,NPY_FLOAT,in); > - --------------------- > > As you probably guessed, "in" is a 2D array of floats of dimensions > "dims". > > My questions are thus: > - - Why do the two methods provide different results ? > - - How do I get the second to behave like the first ? > In the second case, "in" should be a pointer to a place in memory with space for dims[0]*dims[1] floats. In particular, it should not be a 2-d array of floats. FromData expects to get a single pointer to float (not a 2D array). I can't think of a way to get the second case to work other than have "in" be a 1-D array. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Tue Oct 27 15:59:04 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 27 Oct 2009 14:59:04 -0500 Subject: [Numpy-discussion] Astype and strings In-Reply-To: References: Message-ID: <8DC4A103-F099-4841-BC1B-BCE4EE2AB1B6@enthought.com> On Oct 26, 2009, at 9:54 AM, Eli Bressert wrote: > Hi Everyone, > > Is Numpy supposed to behave this like this when converting an array of > numbers to an array of strings with astype? In general you have to tell NumPy how big the string should be (i.e. np.str is generic). There are a few places where NumPy will look at the data you have in order to guess a size, but as you've seen astype is not one of those places. I think astype could be fixed (by putting a special-case check in the current code for conversion to an unspecified-length string), but that has not been implemented. Please file a feature enhancement issue on the Trac so we don't lose sight of this. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Tue Oct 27 16:04:08 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 27 Oct 2009 15:04:08 -0500 Subject: [Numpy-discussion] Multiplicity of an entry In-Reply-To: <4AE74A9B.4060603@stsci.edu> References: <4AE59532.3000708@american.edu> <4AE5E6A1.4030901@noaa.gov> <1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com> <710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il> <4AE71B51.8040002@noaa.gov> <4AE74A9B.4060603@stsci.edu> Message-ID: On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote: > Christopher Barker wrote: >> Nadav Horesh wrote: >> >>> np.equal(a,a).sum(0) >>> >>> but, for unknown reason, np.equal operates only on "normal" arrays. >>> >> >> true: >> >> In [25]: a >> Out[25]: >> array(['abc', 'def', 'abc', 'ghij'], >> dtype='|S4') >> >> In [27]: np.equal(a,a) >> Out[27]: NotImplemented >> >> however: >> >> In [28]: a == a >> Out[28]: array([ True, True, True, True], dtype=bool) >> >> don't they use the same code? or is "==" reverting to plain old >> generic >> python sequence comparison, which would partly explain why it is so >> slow. >> > It looks as if "a == a" (that is array_richcompare) is triggering > special case code for strings, so it is fast. However, IMHO np.equal > should be made to work as well. Can you file a bug and assign it to > me > (I'm dealing with a number of other string-related things, so I > might as > well take this too). The array_richcompare special-cased strings not for speed but for actual functionality. Making np.equal work with strings requires changes to the ufunc code itself which was never written to work with "variable-length" data- types (like strings, unicode, and records). There are several things that would have to be fixed. Some of the changes we made to allow for date-time data-types also made it possible to support variable-length strings, but this is non-trivial to implement. It's certainly possible, but I would want to look at any changes you make before committing them to make sure all the issues are being understood. Thanks, -Travis -- Travis Oliphant Enthought Inc. 1-512-536-1057 http://www.enthought.com oliphant at enthought.com From mdroe at stsci.edu Tue Oct 27 17:07:33 2009 From: mdroe at stsci.edu (Michael Droettboom) Date: Tue, 27 Oct 2009 17:07:33 -0400 Subject: [Numpy-discussion] Multiplicity of an entry In-Reply-To: References: <4AE59532.3000708@american.edu> <4AE5E6A1.4030901@noaa.gov> <1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com> <710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il> <4AE71B51.8040002@noaa.gov> <4AE74A9B.4060603@stsci.edu> Message-ID: <4AE76115.7010709@stsci.edu> Travis Oliphant wrote: > On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote: > > >> Christopher Barker wrote: >> >>> Nadav Horesh wrote: >>> >>> >>>> np.equal(a,a).sum(0) >>>> >>>> but, for unknown reason, np.equal operates only on "normal" arrays. >>>> >>>> >>> true: >>> >>> In [25]: a >>> Out[25]: >>> array(['abc', 'def', 'abc', 'ghij'], >>> dtype='|S4') >>> >>> In [27]: np.equal(a,a) >>> Out[27]: NotImplemented >>> >>> however: >>> >>> In [28]: a == a >>> Out[28]: array([ True, True, True, True], dtype=bool) >>> >>> don't they use the same code? or is "==" reverting to plain old >>> generic >>> python sequence comparison, which would partly explain why it is so >>> slow. >>> >>> >> It looks as if "a == a" (that is array_richcompare) is triggering >> special case code for strings, so it is fast. However, IMHO np.equal >> should be made to work as well. Can you file a bug and assign it to >> me >> (I'm dealing with a number of other string-related things, so I >> might as >> well take this too). >> > > The array_richcompare special-cased strings not for speed but for > actual functionality. > > Making np.equal work with strings requires changes to the ufunc code > itself which was never written to work with "variable-length" data- > types (like strings, unicode, and records). There are several > things that would have to be fixed. Some of the changes we made to > allow for date-time data-types also made it possible to support > variable-length strings, but this is non-trivial to implement. It's > certainly possible, but I would want to look at any changes you make > before committing them to make sure all the issues are being understood. > Yeah -- I'm realizing this is a bigger project than I initially suspected. I'll keep you posted if I find the time to do this right. Mike -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA From sturla at molden.no Tue Oct 27 17:46:08 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 27 Oct 2009 22:46:08 +0100 Subject: [Numpy-discussion] Syntax highlighting for Cython and NumPy Message-ID: <4AE76A20.2060405@molden.no> Here is an XML for Cython syntax highlighting in katepart (e.g. KATE and KDevelop). I made this because KATE is my faviourite text edior (feel free to call me a heretic for not using emacs). Unfortunately, the Python highlighting for KDE contains several bugs. And the Pyrex/Cython version that circulates on the web builds on this and introduces a couple more. I have tried to clean it up. Note that this will also highlight numpy.* or np.*, if * is a type or function you get from "cimport numpy" or "import numpy". This works on Windows as well, if you have installed KDE for Windows. Just copy the XML to: ~/.kde/share/apps/katepart/syntax/ C:\kde\share\apps\katepart\syntax (or whereever you have KDE installed) and "Cython with NumPy" shows up under Sources. Anyway, this is the syntax high-lighter I use to write Cython. Feel free to use it as you wish. P.S. I am also cleaning up Python high-lighting for KDE. Not done yet, but I will post a "Python with NumPy" highlighter later on if this is interesting. P.P.S. This also covers Pyrex, but add in some Cython stuff. Sturla Molden -------------- next part -------------- A non-text attachment was scrubbed... Name: cython.xml Type: text/xml Size: 34481 bytes Desc: not available URL: From sturla at molden.no Tue Oct 27 18:31:55 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 27 Oct 2009 23:31:55 +0100 Subject: [Numpy-discussion] [Cython] Syntax highlighting for Cython and NumPy In-Reply-To: <4AE76A20.2060405@molden.no> References: <4AE76A20.2060405@molden.no> Message-ID: <4AE774DB.4020209@molden.no> Sturla Molden skrev: > > and "Cython with NumPy" shows up under Sources. Anyway, this is the > syntax high-lighter I use to write Cython. It seems I posted the wrong file. :-( S.M. -------------- next part -------------- A non-text attachment was scrubbed... Name: cython.xml Type: text/xml Size: 34521 bytes Desc: not available URL: From sturla at molden.no Tue Oct 27 19:25:36 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 28 Oct 2009 00:25:36 +0100 Subject: [Numpy-discussion] [Cython] Syntax highlighting for Cython and NumPy In-Reply-To: References: <4AE76A20.2060405@molden.no> <4AE774DB.4020209@molden.no> Message-ID: <4AE78170.4040601@molden.no> Lisandro Dalcin skrev: > Is there any specific naming convention for these XML files to work > with KATE? Would it be fine to call it 'cython-mode-kate.xml' to push > it to the repo? Will it still work (I mean, with that name) when > placed appropriately in KATE config dirs or whatever? ... Just > concerned that 'cython.xml' is a bit too generic filename... > > You can name it anything you want. The file has an entry like this: Hi, Is there something wrong with scipy.special.hermite? The following code produces glibc errors: ------------8<----------------------- import scipy.special h = [] for i in xrange(15): print i h.append(scipy.special.hermite(i+1)) ------------8<----------------------- results in ... 12 *** glibc detected *** python: free(): invalid next size (fast): 0x00000000007e2290 *** OS: OpenSUSE 11.1 (x86_64) Python 2.6.0 Scipy: 0.7.0 When using ipython 0.8.4 on the same machine, the error does not occur. What may be the problem here? Regards Ole From gokhansever at gmail.com Wed Oct 28 09:47:08 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 28 Oct 2009 08:47:08 -0500 Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays In-Reply-To: <1cd32cbb0910270625p72fdd11fj8f18db42aa9bb566@mail.gmail.com> References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com> <1cd32cbb0910270625p72fdd11fj8f18db42aa9bb566@mail.gmail.com> Message-ID: <49d6b3500910280647n5351e248se85b432eb4a660e2@mail.gmail.com> On Tue, Oct 27, 2009 at 8:25 AM, wrote: > This should not be the correct results if you use > scipy.stats.scoreatpercentile, > it doesn't have correct missing value handling, it treats nans or > mask/fill values as regular numbers sorted to the end. > > stats.mstats.scoreatpercentile is the corresponding function for > masked arrays. > > Thanks for the suggestion. I forgot the existence of such module. It yields better results. I[14]: st.mstats.scoreatpercentile(r, per=25) O[14]: masked_array(data = 0.401055201111, mask = False, fill_value = 1e+20) I[17]: st.scoreatpercentile(r, per=25) O[17]: masked_array(data = --, mask = True, fill_value = 1e+20) I usually fall into traps using masked arrays. Hopefully I will figure out these before I make funnier mistakes in my analysis. Besides, it would be nice to have the "per" argument accepts a sequence instead of a one item. Like matplotlib's prctile. Using it as: ...(array, per=[5,25,50,75,95]) in a one call. > (BTW I wasn't able to quickly copy and past your example because > MaskedArrays don't seem to have a constructive __repr__, i.e. > no commas) > > You can copy and paste the sample data from this link. When I copied from a txt file into gmail into somehow distorted the original look of the data. http://code.google.com/p/ccnworks/source/browse/trunk/sample.data > I don't know anything about the matplotlib story. > > Josef > > > > > I[55]: stats.scoreatpercentile(am/bm, per=5) > > O[55]: 0.40877012449846228 > > > > I[49]: stats.scoreatpercentile(am/bm, per=25) > > O[49]: > > masked_array(data = --, > > mask = True, > > fill_value = 1e+20) > > > > I[56]: stats.scoreatpercentile(am/bm, per=95) > > O[56]: > > masked_array(data = --, > > mask = True, > > fill_value = 1e+20) > > > > > > Any confirmation? > > > > > > > > > > > > > > > > -- > > G?khan > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Wed Oct 28 09:52:32 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 28 Oct 2009 08:52:32 -0500 Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays In-Reply-To: <08B7694F-30B0-4303-83B5-3687BD2A3777@gmail.com> References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com> <08B7694F-30B0-4303-83B5-3687BD2A3777@gmail.com> Message-ID: <49d6b3500910280652i701e2988yf4e5b2ab83d5f575@mail.gmail.com> On Tue, Oct 27, 2009 at 12:23 PM, Pierre GM wrote: > > On Oct 27, 2009, at 7:56 AM, G?khan Sever wrote: > > > > > > Unfortunately, matplotlib.mlab's prctile cannot handle this division: > > Actually, the division's OK, it's mlab.prctile which is borked. It > uses the length of the input array instead of its count to compute the > nb of valid data. The easiest workaround in your case is probably to > use: > >>> prctile((am/bm).compressed(), p=[5,25,50,75,95]) > HIH > P. > Great. Exact solution. I should have asked this last week :) One simple method solves all the riddle. I had manually masked the MVCs using NaN's. My guess is using compressed() masked arrays could be used with any of regularly defined numpy and scipy functions, right? Thanks for the tip. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Oct 28 10:03:23 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 28 Oct 2009 10:03:23 -0400 Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays In-Reply-To: <49d6b3500910280652i701e2988yf4e5b2ab83d5f575@mail.gmail.com> References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com> <08B7694F-30B0-4303-83B5-3687BD2A3777@gmail.com> <49d6b3500910280652i701e2988yf4e5b2ab83d5f575@mail.gmail.com> Message-ID: <1cd32cbb0910280703p1805ef1tc28eb4ccb65b4ef1@mail.gmail.com> On Wed, Oct 28, 2009 at 9:52 AM, G?khan Sever wrote: > > > On Tue, Oct 27, 2009 at 12:23 PM, Pierre GM wrote: >> >> On Oct 27, 2009, at 7:56 AM, G?khan Sever wrote: >> > >> > >> > Unfortunately, matplotlib.mlab's prctile cannot handle this division: >> >> Actually, the division's OK, it's mlab.prctile which is borked. It >> uses the length of the input array instead of its count to compute the >> nb of valid data. The easiest workaround in your case is probably to >> use: >> ?>>> prctile((am/bm).compressed(), p=[5,25,50,75,95]) >> HIH >> P. > > Great. Exact solution. I should have asked this last week :) > > One simple method solves all the riddle. I had manually masked the MVCs > using NaN's. > > My guess is using compressed() masked arrays could be used with any of > regularly defined numpy and scipy functions, right? Yes, however it only works for 1d or with ravel(). You cannot compress a 2d array, and preserve a rectangular shape (with unequal numbers of missing numbers.) I some cases removing rows or columns with missing values might be more appropriate, or finding a "neutral" fill value. Josef > > Thanks for the tip. > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > G?khan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From pschmidtke at mmb.pcb.ub.es Wed Oct 28 15:31:43 2009 From: pschmidtke at mmb.pcb.ub.es (Peter Schmidtke) Date: Wed, 28 Oct 2009 20:31:43 +0100 Subject: [Numpy-discussion] reading gzip compressed files using numpy.fromfile Message-ID: Dear Numpy Mailing List Readers, I have a quite simple problem, for what I did not find a solution for now. I have a gzipped file lying around that has some numbers stored in it and I want to read them into a numpy array as fast as possible but only a bunch of data at a time. So I would like to use numpys fromfile funtion. For now I have somehow the following code : f=gzip.open( "myfile.gz", "r" ) xyz=npy.fromfile(f,dtype="float32",count=400) So I would read 400 entries from the file, keep it open, process my data, come back and read the next 400 entries. If I do this, numpy is complaining that the file handle f is not a normal file handle : OError: first argument must be an open file but in fact it is a zlib file handle. But gzip gives access to the normal filehandle through f.fileobj. So I tried xyz=npy.fromfile(f.fileobj,dtype="float32",count=400) But there I get just meaningless values (not the actual data) and when I specify the sep=" " argument for npy.fromfile I get just .1 and nothing else. Can you tell me why and how to fix this problem? I know that I could read everything to memory, but these files are rather big, so I simply have to avoid this. Thanks in advance. -- Peter Schmidtke ---------------------- PhD Student at the Molecular Modeling and Bioinformatics Group Dep. Physical Chemistry Faculty of Pharmacy University of Barcelona From robert.kern at gmail.com Wed Oct 28 15:33:11 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 28 Oct 2009 14:33:11 -0500 Subject: [Numpy-discussion] reading gzip compressed files using numpy.fromfile In-Reply-To: References: Message-ID: <3d375d730910281233r5cadd0fcubea14676a3a978f1@mail.gmail.com> On Wed, Oct 28, 2009 at 14:31, Peter Schmidtke wrote: > Dear Numpy Mailing List Readers, > > I have a quite simple problem, for what I did not find a solution for now. > I have a gzipped file lying around that has some numbers stored in it and I > want to read them into a numpy array as fast as possible but only a bunch > of data at a time. > So I would like to use numpys fromfile funtion. > > For now I have somehow the following code : > > > > ? ? ? ?f=gzip.open( "myfile.gz", "r" ) > xyz=npy.fromfile(f,dtype="float32",count=400) > > > So I would read 400 entries from the file, keep it open, process my data, > come back and read the next 400 entries. If I do this, numpy is complaining > that the file handle f is not a normal file handle : > OError: first argument must be an open file > > but in fact it is a zlib file handle. But gzip gives access to the normal > filehandle through f.fileobj. np.fromfile() requires a true file object, not just a file-like object. np.fromfile() works by grabbing the FILE* pointer underneath and using C system calls to read the data, not by calling the .read() method. > So I tried ?xyz=npy.fromfile(f.fileobj,dtype="float32",count=400) > > But there I get just meaningless values (not the actual data) and when I > specify the sep=" " argument for npy.fromfile I get just .1 and nothing > else. This is reading the compressed data, not the data that you want. > Can you tell me why and how to fix this problem? I know that I could read > everything to memory, but these files are rather big, so I simply have to > avoid this. Read in reasonably-sized chunks of bytes at a time, and use np.fromstring() to create arrays from them. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Wed Oct 28 16:26:41 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 28 Oct 2009 13:26:41 -0700 Subject: [Numpy-discussion] reading gzip compressed files using numpy.fromfile In-Reply-To: <3d375d730910281233r5cadd0fcubea14676a3a978f1@mail.gmail.com> References: <3d375d730910281233r5cadd0fcubea14676a3a978f1@mail.gmail.com> Message-ID: <4AE8A901.3060403@noaa.gov> Robert Kern wrote: >> f=gzip.open( "myfile.gz", "r" ) >> xyz=npy.fromfile(f,dtype="float32",count=400) > Read in reasonably-sized chunks of bytes at a time, and use > np.fromstring() to create arrays from them. Something like: count = 400 xyz = np.fromstring(f.read(count*4), dtype=np.float32) should work (untested...) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From sccolbert at gmail.com Wed Oct 28 18:05:05 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Wed, 28 Oct 2009 23:05:05 +0100 Subject: [Numpy-discussion] Segfault when using scipy.special.hermite? In-Reply-To: References: Message-ID: <7f014ea60910281505t686f244cx4a452b4978977b9a@mail.gmail.com> that code works fine for me: ubuntu 9.04 x64 python 2.6.2 scipy 0.7.1 numpy 1.3.0 ipython 0.9.1 On Wed, Oct 28, 2009 at 2:21 PM, Ole Streicher wrote: > Hi, > > Is there something wrong with scipy.special.hermite? The following code > produces glibc errors: > > ------------8<----------------------- > import scipy.special > h = [] > for i in xrange(15): > ? ?print i > ? ?h.append(scipy.special.hermite(i+1)) > ------------8<----------------------- > > results in > ... > 12 > *** glibc detected *** python: free(): invalid next size (fast): 0x00000000007e2290 *** > > OS: OpenSUSE 11.1 (x86_64) > Python 2.6.0 > Scipy: 0.7.0 > > When using ipython 0.8.4 on the same machine, the error does not occur. > > What may be the problem here? > > Regards > > Ole > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav at iki.fi Wed Oct 28 18:31:02 2009 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 29 Oct 2009 00:31:02 +0200 Subject: [Numpy-discussion] Segfault when using scipy.special.hermite? In-Reply-To: References: Message-ID: <1256769062.7650.0.camel@idol> ke, 2009-10-28 kello 14:21 +0100, Ole Streicher kirjoitti: > Is there something wrong with scipy.special.hermite? The following > code produces glibc errors: It's probably this issue: http://projects.scipy.org/numpy/ticket/1211 The most likely cause is that the linear algebra libraries (ATLAS/BLAS/LAPACK) shipped with that version of 64-bit Opensuse are somehow broken. At least on Mandriva it turned out that the problem did not appear if ATLAS was not installed, and it also went away with a newer version of LAPACK. (special.hermite is pure-python code. The only part that can cause problems is scipy.linalg.eig or numpy.linalg.eig, and, much less likely, scipy.special.gamma. The former are thin wrappers around LAPACK routines.) -- Pauli Virtanen From dyamins at gmail.com Thu Oct 29 00:29:38 2009 From: dyamins at gmail.com (Dan Yamins) Date: Thu, 29 Oct 2009 00:29:38 -0400 Subject: [Numpy-discussion] Numpy/Scipy for EC2 Message-ID: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com> Hi all: I'm gearing up to build an Amazon Machine Instance (AMI) for use in doing Numpy/Scipy computations on the Amazon EC2 cloud. I'm writing to ask if anyone has any advice for which (if any) publicly available AMI I should start with. If any one has any specific AMI's that they think are good bases from which to modify -- or really, any other advice about using numpy/scipy on EC2 -- I'd love to know. Beyond that, even if you don't know which AMI to recommend (or even what an AMI is), I still would like advice about which Linux flavor to use. I've had some experience with Mac OSX (and, with David Cornapeau's help over this list, I was able to build 64-bit Scipy with Python 2.6!), but I really know nothing about what the build process is like on Linux (and most likely, unless someone recommends a good AMI with optimized BLAS/LAPACK already built, I'm going to have to built it from scratch). So, should I use Ubuntu or Debian or Fedora or Centos or ...? Thanks! Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From deldotdr at gmail.com Thu Oct 29 02:24:00 2009 From: deldotdr at gmail.com (Dorian Raymer) Date: Wed, 28 Oct 2009 23:24:00 -0700 Subject: [Numpy-discussion] Numpy/Scipy for EC2 In-Reply-To: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com> References: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com> Message-ID: Hi Dan, I have recently created an AMI for running python processes. I recommend using the ubuntu server ami's provided by http://alestic.com/. Alestic is a well known provider of public AMI images. I think this is exactly the place you want to start from; anything you need is an apt-get or easy_install away. >From the moment you launch an instance, you are literally minutes away from being able to run a computation with Python. I also recommend the FireFox plugin called ElasticFox for interfacing the AWS api. It is a lot easier than the command line api tools! I left some rough notes on my AMI creation/setup process here: http://wiki.github.com/codenode/codenode/backend-demonstration-ec2-image The notes include the ami-id of my resulting image, which you should be able to launch if you wish. If you are interested, I can dive into more detail on how I set up the os/python environment, etc. The image I created is used as the Codenode live public notebook backend: http://live.codenode.org/ You can create an account, login, start a Notebook, import Numpy and run any code you want right now! Hope this is useful, Dorian I cross-posted this to codenode-devel, sympy, and sage-notebook; I think this topic could be of interest to others on those lists. On Wed, Oct 28, 2009 at 9:29 PM, Dan Yamins wrote: > Hi all: > > I'm gearing up to build an Amazon Machine Instance (AMI) for use in doing > Numpy/Scipy computations on the Amazon EC2 cloud. > > I'm writing to ask if anyone has any advice for which (if any) publicly > available AMI I should start with. > > If any one has any specific AMI's that they think are good bases from which > to modify -- or really, any other advice about using numpy/scipy on EC2 -- > I'd love to know. > > Beyond that, even if you don't know which AMI to recommend (or even what an > AMI is), I still would like advice about which Linux flavor to use. I've > had some experience with Mac OSX (and, with David Cornapeau's help over this > list, I was able to build 64-bit Scipy with Python 2.6!), but I really know > nothing about what the build process is like on Linux (and most likely, > unless someone recommends a good AMI with optimized BLAS/LAPACK already > built, I'm going to have to built it from scratch). So, should I use > Ubuntu or Debian or Fedora or Centos or ...? > > Thanks! > Dan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Thu Oct 29 03:17:10 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 29 Oct 2009 16:17:10 +0900 Subject: [Numpy-discussion] [RFC] new function for floating point comparison Message-ID: <4AE94176.8080804@ar.media.kyoto-u.ac.jp> Hi, I have added a couple of utilities for floating point comparison, to be used in unit tests mostly, and would like some comments, especially from people knowledgeable about floating point. http://github.com/cournape/numpy/tree/new_ulp_comp The main difference compared to other functions is that they are 'amplitude-independent', and use IEEE-754-specific properties. The tolerance is based on ULP, and two numbers x, y are closed depending on how many numbers are representable between x and y at the given precision. The branch contains the following new functions: * spacing(x): equivalent to the F90 intrinsic. Returns the smallest representable number needed so that spacing(x) + x > x. Spacing(1) is EPS by definition. * assert_array_almost_equal_nulp(x, y, nulp=1): assertion is defined as abs(x - y) <= nulps * spacing(max(abs(x), abs(y))). * assert_array_max_ulp(a, b, maxulp=1, dtype=None): given two numbers a and b, raise an assertion if there are more than maxulp representable numbers between a and b. They only support single and double precision - for complex number, one could arbitrarily define a distance between numbers based on nulps, say max of number of representable number for real and imag parts. Extended precision would be a bit more painful, because of the variety of implementations. I hope that they can give more robust/meaningful comparison for most of our unit tests, cheers, David From nabble2 at lonely-star.org Thu Oct 29 07:19:05 2009 From: nabble2 at lonely-star.org (TheLonelyStar) Date: Thu, 29 Oct 2009 04:19:05 -0700 (PDT) Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array element with a sequence. Message-ID: <26111151.post@talk.nabble.com> Hi, I am trying to load a tsv file using numpy.loadtxt: data = np.loadtxt('data.txt',delimiter='\t',dtype=np.float) And I get: ----------------- /usr/lib/python2.6/site-packages/numpy/lib/io.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack) 503 X = X.view(dtype) 504 else: --> 505 X = np.array(X, dtype) 506 507 X = np.squeeze(X) ValueError: setting an array element with a sequence. > /usr/lib/python2.6/site-packages/numpy/lib/io.py(505)loadtxt() 504 else: --> 505 X = np.array(X, dtype) 506 ---------------- I am on archlinux using 1.3.0. The file contians integers and floats sperated by tabs. Ideas? Thanks! Nathan -- View this message in context: http://www.nabble.com/numpy-loadtxt---ValueError%3A-setting-an-array-element-with-a-sequence.-tp26111151p26111151.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From pschmidtke at mmb.pcb.ub.es Thu Oct 29 07:38:11 2009 From: pschmidtke at mmb.pcb.ub.es (Peter Schmidtke) Date: Thu, 29 Oct 2009 12:38:11 +0100 Subject: [Numpy-discussion] reading gzip compressed files using numpy.fromfile In-Reply-To: References: Message-ID: <8efd38d31962398588d5c0e87d46e162@mmb.pcb.ub.es> > Date: Wed, 28 Oct 2009 20:31:43 +0100 > From: Peter Schmidtke > Subject: [Numpy-discussion] reading gzip compressed files using > numpy.fromfile > To: numpy-discussion at scipy.org > Message-ID: > Content-Type: text/plain; charset="UTF-8" > > Dear Numpy Mailing List Readers, > > I have a quite simple problem, for what I did not find a solution for now. > I have a gzipped file lying around that has some numbers stored in it and I > want to read them into a numpy array as fast as possible but only a bunch > of data at a time. > So I would like to use numpys fromfile funtion. > > For now I have somehow the following code : > > > > f=gzip.open( "myfile.gz", "r" ) > xyz=npy.fromfile(f,dtype="float32",count=400) > > > So I would read 400 entries from the file, keep it open, process my data, > come back and read the next 400 entries. If I do this, numpy is complaining > that the file handle f is not a normal file handle : > OError: first argument must be an open file > > but in fact it is a zlib file handle. But gzip gives access to the normal > filehandle through f.fileobj. > > So I tried xyz=npy.fromfile(f.fileobj,dtype="float32",count=400) > > But there I get just meaningless values (not the actual data) and when I > specify the sep=" " argument for npy.fromfile I get just .1 and nothing > else. > > Can you tell me why and how to fix this problem? I know that I could read > everything to memory, but these files are rather big, so I simply have to > avoid this. > > Thanks in advance. > > > -- > > Peter Schmidtke > > ---------------------- > PhD Student at the Molecular Modeling and Bioinformatics Group > Dep. Physical Chemistry > Faculty of Pharmacy > University of Barcelona > > > > ------------------------------ > > Message: 2 > Date: Wed, 28 Oct 2009 14:33:11 -0500 > From: Robert Kern > Subject: Re: [Numpy-discussion] reading gzip compressed files using > numpy.fromfile > To: Discussion of Numerical Python > Message-ID: > <3d375d730910281233r5cadd0fcubea14676a3a978f1 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Wed, Oct 28, 2009 at 14:31, Peter Schmidtke > wrote: >> Dear Numpy Mailing List Readers, >> >> I have a quite simple problem, for what I did not find a solution for >> now. >> I have a gzipped file lying around that has some numbers stored in it and >> I >> want to read them into a numpy array as fast as possible but only a bunch >> of data at a time. >> So I would like to use numpys fromfile funtion. >> >> For now I have somehow the following code : >> >> >> >> ? ? ? ?f=gzip.open( "myfile.gz", "r" ) >> xyz=npy.fromfile(f,dtype="float32",count=400) >> >> >> So I would read 400 entries from the file, keep it open, process my data, >> come back and read the next 400 entries. If I do this, numpy is >> complaining >> that the file handle f is not a normal file handle : >> OError: first argument must be an open file >> >> but in fact it is a zlib file handle. But gzip gives access to the normal >> filehandle through f.fileobj. > > np.fromfile() requires a true file object, not just a file-like > object. np.fromfile() works by grabbing the FILE* pointer underneath > and using C system calls to read the data, not by calling the .read() > method. > >> So I tried ?xyz=npy.fromfile(f.fileobj,dtype="float32",count=400) >> >> But there I get just meaningless values (not the actual data) and when I >> specify the sep=" " argument for npy.fromfile I get just .1 and nothing >> else. > > This is reading the compressed data, not the data that you want. > >> Can you tell me why and how to fix this problem? I know that I could read >> everything to memory, but these files are rather big, so I simply have to >> avoid this. > > Read in reasonably-sized chunks of bytes at a time, and use > np.fromstring() to create arrays from them. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > > > ------------------------------ > > Message: 3 > Date: Wed, 28 Oct 2009 13:26:41 -0700 > From: Christopher Barker > Subject: Re: [Numpy-discussion] reading gzip compressed files using > numpy.fromfile > To: Discussion of Numerical Python > Message-ID: <4AE8A901.3060403 at noaa.gov> > Content-Type: text/plain; charset=UTF-8; format=flowed > > Robert Kern wrote: >>> f=gzip.open( "myfile.gz", "r" ) >>> xyz=npy.fromfile(f,dtype="float32",count=400) > >> Read in reasonably-sized chunks of bytes at a time, and use >> np.fromstring() to create arrays from them. > > Something like: > > count = 400 > xyz = np.fromstring(f.read(count*4), dtype=np.float32) > > should work (untested...) > > -Chris > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > Thanks Robert and Chris...indeed I managed to read it quite fast this way. ++ Peter Schmidtke ---------------------- PhD Student at the Molecular Modeling and Bioinformatics Group Dep. Physical Chemistry Faculty of Pharmacy University of Barcelona From pschmidtke at mmb.pcb.ub.es Thu Oct 29 07:48:00 2009 From: pschmidtke at mmb.pcb.ub.es (Peter Schmidtke) Date: Thu, 29 Oct 2009 12:48:00 +0100 Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array element with a sequence. Message-ID: <42b97315c0694c06f08eb9a59479a7bc@mmb.pcb.ub.es> Have you tried the numpy.fromfile function? This usually worked great for my files that had the same format than yours. ++ Peter ---------------------- PhD Student at the Molecular Modeling and Bioinformatics Group Dep. Physical Chemistry Faculty of Pharmacy University of Barcelona From robince at gmail.com Thu Oct 29 07:44:11 2009 From: robince at gmail.com (Robin) Date: Thu, 29 Oct 2009 11:44:11 +0000 Subject: [Numpy-discussion] recommended way to run numpy on snow leopard In-Reply-To: References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> <4ADED429.4030809@ar.media.kyoto-u.ac.jp> <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> Message-ID: <2d5132a50910290444o3f598652s35601244f4b16ef3@mail.gmail.com> On Fri, Oct 23, 2009 at 9:09 AM, David Warde-Farley wrote: > The Python.org sources for 2.6.x has a script in the Mac/ subdirectory > (I think, or in the build tools) for building a 4-way universal binary > (i386, x86_64, ppc and ppc64). You can rather easily build it (just > run the script) and it will produce executables of the form python (or > python2.6) suffixed with -32 or -64 to run in one mode or the other. > So, python-32 (or python2.6-32) will get you 32 bit Python, which will > work with wxPython using wxMac, or python-64, which will not (but will > do everything in 64-bit mode). I've successfully gotten svn numpy to > build 4-way using such a 4-way Python. > After having some trouble I decided to try this way to build universal 32/64 bit intel framework build and just use that as my main python for my work. (Had some problems with macports and virtualenv, I want to leave the system one alone and theres no 64 bit python.org build). Just in case any one else tries this - there is a problem where it's impossible to select the 32 bit architecture: http://bugs.python.org/issue6834 It might be possible to work around or use the alternative pythonw.c in the ticket - but it won't be fixed in a release until 2.7. Cheers Robin -------------- next part -------------- An HTML attachment was scrubbed... URL: From nabble2 at lonely-star.org Thu Oct 29 08:30:09 2009 From: nabble2 at lonely-star.org (TheLonelyStar) Date: Thu, 29 Oct 2009 05:30:09 -0700 (PDT) Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array element with a sequence. In-Reply-To: <26111151.post@talk.nabble.com> References: <26111151.post@talk.nabble.com> Message-ID: <26112100.post@talk.nabble.com> Adter trying the same thing in matlab, I realized that my "tsv" file is not matrix-style. But this I mean, not all lines ave the same lenght (not the same number of values). What would be the best way to load this? Regards, Nathan -- View this message in context: http://www.nabble.com/numpy-loadtxt---ValueError%3A-setting-an-array-element-with-a-sequence.-tp26111151p26112100.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From pschmidtke at mmb.pcb.ub.es Thu Oct 29 09:19:50 2009 From: pschmidtke at mmb.pcb.ub.es (Peter Schmidtke) Date: Thu, 29 Oct 2009 14:19:50 +0100 Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array element with a sequence. In-Reply-To: <26112100.post@talk.nabble.com> References: <26111151.post@talk.nabble.com> <26112100.post@talk.nabble.com> Message-ID: On Thu, 29 Oct 2009 05:30:09 -0700 (PDT), TheLonelyStar wrote: > Adter trying the same thing in matlab, I realized that my "tsv" file is not > matrix-style. But this I mean, not all lines ave the same lenght (not the > same number of values). > > What would be the best way to load this? > > Regards, > Nathan Use the numpy fromfile function : For instance I read the file : 5 8 5 5.5 6.1 3 5.5 2 6.5 with : x=npy.fromfile("test.txt",sep="\t") and it returns an array x : array([ 5. , 8. , 5. , 5.5, 6.1, 3. , 5.5, 2. , 6.5]) You can reshape this array to a 3x3 matrix using the reshape function -> x.reshape((3,3)) -- Peter Schmidtke ---------------------- PhD Student at the Molecular Modeling and Bioinformatics Group Dep. Physical Chemistry Faculty of Pharmacy University of Barcelona From bsouthey at gmail.com Thu Oct 29 09:22:34 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 29 Oct 2009 08:22:34 -0500 Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array element with a sequence. In-Reply-To: <26112100.post@talk.nabble.com> References: <26111151.post@talk.nabble.com> <26112100.post@talk.nabble.com> Message-ID: <4AE9971A.2090809@gmail.com> On 10/29/2009 07:30 AM, TheLonelyStar wrote: > Adter trying the same thing in matlab, I realized that my "tsv" file is not > matrix-style. But this I mean, not all lines ave the same lenght (not the > same number of values). > > What would be the best way to load this? > > Regards, > Nathan > Hi, Really you have to find the reason why there are extra values in some rows compared to other rows. There have been some recent changes in numpy.genfromtxt that I would strong suggest using. It will indicate any problem rows that you can fix or just ignore. Regards Bruce From dagss at student.matnat.uio.no Thu Oct 29 09:26:44 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 29 Oct 2009 14:26:44 +0100 Subject: [Numpy-discussion] Unexplained nans in matrix multiplication Message-ID: <4AE99814.7000306@student.matnat.uio.no> I'm getting (to me( very mysterious NaNs when doing matrix multiplication with certain (randomly generated) data: In [52]: a.shape, b.shape, i, j Out[52]: ((22, 1000), (1000, 22), 0, 16) In [53]: np.dot(a, b)[i,j] Out[53]: (31.322778824758661+nan*j) In [54]: np.dot(a[i,:], b[:,j]) Out[54]: (31.322778824758657+6.5017268607881213j) In [55]: np.any(np.isnan(a)), np.any(np.isnan(b)) Out[55]: (False, False) In [63]: np.max(np.abs(np.vstack((a.real, a.imag, b.real.T, b.imag.T)))) Out[63]: 4.0744710639852633 dtype is complex128. Is this a bug? Should I start looking in NumPy, ATLAS (Sage-compiled), the C compiler, the Fortran compiler...*shrug* I realize that matmul doesn't have to happen via naive vector dot products, but certainly one shouldn't hit Inf anywhere anyway? Dag Sverre From cournape at gmail.com Thu Oct 29 09:31:26 2009 From: cournape at gmail.com (David Cournapeau) Date: Thu, 29 Oct 2009 22:31:26 +0900 Subject: [Numpy-discussion] Unexplained nans in matrix multiplication In-Reply-To: <4AE99814.7000306@student.matnat.uio.no> References: <4AE99814.7000306@student.matnat.uio.no> Message-ID: <5b8d13220910290631o541fa446ja2cf5fcadf7b753b@mail.gmail.com> On Thu, Oct 29, 2009 at 10:26 PM, Dag Sverre Seljebotn wrote: > I'm getting (to me( very mysterious NaNs when doing matrix > multiplication with certain (randomly generated) data: > > In [52]: a.shape, b.shape, i, j > Out[52]: ((22, 1000), (1000, 22), 0, 16) > > In [53]: np.dot(a, b)[i,j] > Out[53]: (31.322778824758661+nan*j) > > In [54]: np.dot(a[i,:], b[:,j]) > Out[54]: (31.322778824758657+6.5017268607881213j) > > In [55]: np.any(np.isnan(a)), np.any(np.isnan(b)) > Out[55]: (False, False) > > In [63]: np.max(np.abs(np.vstack((a.real, a.imag, b.real.T, b.imag.T)))) > Out[63]: 4.0744710639852633 > > dtype is complex128. Is this a bug? Should I start looking in NumPy, > ATLAS (Sage-compiled), the C compiler, the Fortran compiler...*shrug* Most likely an atlas bug. Which version of atlas are you using, on which cpu ? David From dagss at student.matnat.uio.no Thu Oct 29 09:39:44 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 29 Oct 2009 14:39:44 +0100 Subject: [Numpy-discussion] Unexplained nans in matrix multiplication In-Reply-To: <5b8d13220910290631o541fa446ja2cf5fcadf7b753b@mail.gmail.com> References: <4AE99814.7000306@student.matnat.uio.no> <5b8d13220910290631o541fa446ja2cf5fcadf7b753b@mail.gmail.com> Message-ID: <4AE99B20.2090401@student.matnat.uio.no> David Cournapeau wrote: > On Thu, Oct 29, 2009 at 10:26 PM, Dag Sverre Seljebotn > wrote: > >> I'm getting (to me( very mysterious NaNs when doing matrix >> multiplication with certain (randomly generated) data: >> >> In [52]: a.shape, b.shape, i, j >> Out[52]: ((22, 1000), (1000, 22), 0, 16) >> >> In [53]: np.dot(a, b)[i,j] >> Out[53]: (31.322778824758661+nan*j) >> >> In [54]: np.dot(a[i,:], b[:,j]) >> Out[54]: (31.322778824758657+6.5017268607881213j) >> >> In [55]: np.any(np.isnan(a)), np.any(np.isnan(b)) >> Out[55]: (False, False) >> >> In [63]: np.max(np.abs(np.vstack((a.real, a.imag, b.real.T, b.imag.T)))) >> Out[63]: 4.0744710639852633 >> >> dtype is complex128. Is this a bug? Should I start looking in NumPy, >> ATLAS (Sage-compiled), the C compiler, the Fortran compiler...*shrug* >> > > Most likely an atlas bug. Which version of atlas are you using, on which cpu ? > Thanks. Sage reports atlas-3.8.3.p7. Intel(R) Xeon(TM) CPU 3.20GHz, 64-bit RedHat Linux. Dag Sverre From robert.kern at gmail.com Thu Oct 29 11:41:23 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 29 Oct 2009 10:41:23 -0500 Subject: [Numpy-discussion] Numpy/Scipy for EC2 In-Reply-To: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com> References: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com> Message-ID: <3d375d730910290841u78343cd2v4715cd40001bb09a@mail.gmail.com> On Wed, Oct 28, 2009 at 23:29, Dan Yamins wrote: > Hi all: > > I'm gearing up to build an Amazon Machine Instance (AMI) for use in doing > Numpy/Scipy computations on the Amazon EC2 cloud. > > I'm writing to ask if anyone has any advice for which (if any) publicly > available AMI I should start with. I haven't used it, but this seems to provide a good environment for your needs. http://web.mit.edu/stardev/cluster/ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dyamins at gmail.com Thu Oct 29 11:45:10 2009 From: dyamins at gmail.com (Dan Yamins) Date: Thu, 29 Oct 2009 11:45:10 -0400 Subject: [Numpy-discussion] Numpy/Scipy for EC2 In-Reply-To: <3d375d730910290841u78343cd2v4715cd40001bb09a@mail.gmail.com> References: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com> <3d375d730910290841u78343cd2v4715cd40001bb09a@mail.gmail.com> Message-ID: <15e4667e0910290845te2f9217gf03efa635ff82bd4@mail.gmail.com> I haven't used it, but this seems to provide a good environment for your > needs. > > http://web.mit.edu/stardev/cluster/ > > Robert Kern to the rescue again! StarCluster looks great. .... And thanks Dorian as well, I'm also checking out Alestic. Dan > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Oct 29 11:51:59 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 29 Oct 2009 10:51:59 -0500 Subject: [Numpy-discussion] [RFC] new function for floating point comparison In-Reply-To: <4AE94176.8080804@ar.media.kyoto-u.ac.jp> References: <4AE94176.8080804@ar.media.kyoto-u.ac.jp> Message-ID: <3d375d730910290851k12df5db7v26c3c479d8b1efd6@mail.gmail.com> On Thu, Oct 29, 2009 at 02:17, David Cournapeau wrote: > Hi, > > ? ?I have added a couple of utilities for floating point comparison, to > be used in unit tests mostly, and would like some comments, especially > from people knowledgeable about floating point. > > http://github.com/cournape/numpy/tree/new_ulp_comp > > The main difference compared to other functions is that they are > 'amplitude-independent', and use IEEE-754-specific properties. The > tolerance is based on ULP, and two numbers x, y are closed depending on > how many numbers are representable between x and y at the given > precision. The branch contains the following new functions: > > ? ?* spacing(x): equivalent to the F90 intrinsic. Returns the smallest > representable number needed so that spacing(x) + x > x. Spacing(1) is > EPS by definition. > ? ?* assert_array_almost_equal_nulp(x, y, nulp=1): assertion is defined > as abs(x - y) <= nulps * spacing(max(abs(x), abs(y))). > ? ?* assert_array_max_ulp(a, b, maxulp=1, dtype=None): given two > numbers a and b, raise an assertion if there are more than maxulp > representable numbers between a and b. That sounds good. Another worthwhile addition would be nextafter(). http://www.opengroup.org/onlinepubs/000095399/functions/nextafter.html With a little bit of care, a nextafter ufunc can be used to generate a dense grid of floating point values around a given center. This can be used to explore the error characteristics of a function at a very fine level of detail that is otherwise unavailable. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Thu Oct 29 12:29:20 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 29 Oct 2009 09:29:20 -0700 Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array element with a sequence. In-Reply-To: <42b97315c0694c06f08eb9a59479a7bc@mmb.pcb.ub.es> References: <42b97315c0694c06f08eb9a59479a7bc@mmb.pcb.ub.es> Message-ID: <4AE9C2E0.50402@noaa.gov> Peter Schmidtke wrote: > Have you tried the numpy.fromfile function? good point -- fromfile() can be much faster for the simple cases it can handle. > not all lines ave the same lenght (not the > same number of values). > > What would be the best way to load this? That depends on what the data mean. Is it a 2-d array with missing values? If so, how do you know which are missing? Are there the same number of tabs in each row? If do than loadtxt should be able to handle it. You may be best off looping through the file: for line in file: a = numpy.fromstring(line, sep='\t', dtype=np.float) and do what makes sense with each line. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Thu Oct 29 14:31:24 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 29 Oct 2009 14:31:24 -0400 Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array element with a sequence. In-Reply-To: <26112100.post@talk.nabble.com> References: <26111151.post@talk.nabble.com> <26112100.post@talk.nabble.com> Message-ID: On Oct 29, 2009, at 8:30 AM, TheLonelyStar wrote: > > Adter trying the same thing in matlab, I realized that my "tsv" file > is not > matrix-style. But this I mean, not all lines ave the same lenght > (not the > same number of values). > > What would be the best way to load this? The SVN version of np.genfromtxt will let you know where some rows are longer than others. You can decide what to do from then (ignore the corresponding rows or modify your file). The .fromfile approach is a solution if you don't really care about getting a 2D array (or structured 1D array with different fields for ints and floats on a same row), as a previous poster illustrated. From arokem at berkeley.edu Thu Oct 29 15:18:51 2009 From: arokem at berkeley.edu (Ariel Rokem) Date: Thu, 29 Oct 2009 12:18:51 -0700 Subject: [Numpy-discussion] datetime64 Message-ID: <43958ee60910291218i322d94b0y2fe6e362b876ef8b@mail.gmail.com> Hi - I want to start trying out the new dtype for representation of arrays of times, datetime64, which is implemented in the current svn. Is there any documentation anywhere? I know of this proposal: http://numpy.scipy.org/svn/numpy/tags/1.3.0/doc/neps/datetime-proposal3.rst but apparently the current implementation of the dtype didn't follow this proposal - the hypothetical examples in the spec don't work with the implementation. I just want to see a couple of examples on how to initialize arrays of this dtype, and what kinds of operations can be done with them (and with timedelta64). Thanks a lot, Ariel -- Ariel Rokem Helen Wills Neuroscience Institute University of California, Berkeley http://argentum.ucbso.berkeley.edu/ariel From as8ca at mail.astro.virginia.edu Thu Oct 29 16:22:31 2009 From: as8ca at mail.astro.virginia.edu (Alok Singhal) Date: Thu, 29 Oct 2009 16:22:31 -0400 Subject: [Numpy-discussion] datetime64 In-Reply-To: <43958ee60910291218i322d94b0y2fe6e362b876ef8b@mail.gmail.com> References: <43958ee60910291218i322d94b0y2fe6e362b876ef8b@mail.gmail.com> Message-ID: <20091029202231.GA22121@virginia.edu> Hi, On 29/10/09: 12:18, Ariel Rokem wrote: > I want to start trying out the new dtype for representation of arrays > of times, datetime64, which is implemented in the current svn. Is > there any documentation anywhere? I know of this proposal: > > http://numpy.scipy.org/svn/numpy/tags/1.3.0/doc/neps/datetime-proposal3.rst > > but apparently the current implementation of the dtype didn't follow > this proposal - the hypothetical examples in the spec don't work with > the implementation. > I just want to see a couple of examples on how to initialize arrays of > this dtype, and what kinds of operations can be done with them (and > with timedelta64). I think the only thing that works as of now for dates and deltas is using datetime.datetime and datetime.timedelta objects in the initilization of the arrays. See http://projects.scipy.org/numpy/ticket/1225 for some tests. Even when you construct the arrays using datetime.datetime objects, things are a bit strange: In [1]: import numpy as np In [2]: np.__version__ Out[2]: '1.4.0.dev7599' In [3]: import datetime In [4]: d = datetime.datetime(2009, 10, 5, 12, 35, 2) In [5]: d1 = datetime.datetime.now() In [6]: np.array([d, d1], 'M') Out[6]: array([2009-10-04 23:27:37.359744, 2009-10-29 00:10:59.677844], dtype=datetime64[ns]) -Alok -- * * Alok Singhal * * * http://www.astro.virginia.edu/~as8ca/ * * From pgmdevlist at gmail.com Thu Oct 29 16:43:11 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 29 Oct 2009 16:43:11 -0400 Subject: [Numpy-discussion] datetime64 In-Reply-To: <20091029202231.GA22121@virginia.edu> References: <43958ee60910291218i322d94b0y2fe6e362b876ef8b@mail.gmail.com> <20091029202231.GA22121@virginia.edu> Message-ID: <41F46D98-37B4-4E88-BBEC-FCE2CE3C2E6A@gmail.com> On Oct 29, 2009, at 4:22 PM, Alok Singhal wrote: > Hi, > > On 29/10/09: 12:18, Ariel Rokem wrote: >> I want to start trying out the new dtype for representation of arrays >> of times, datetime64, which is implemented in the current svn. Is >> there any documentation anywhere? I know of this proposal: >> >> http://numpy.scipy.org/svn/numpy/tags/1.3.0/doc/neps/datetime-proposal3.rst >> >> but apparently the current implementation of the dtype didn't follow >> this proposal - the hypothetical examples in the spec don't work with >> the implementation. >> I just want to see a couple of examples on how to initialize arrays >> of >> this dtype, and what kinds of operations can be done with them (and >> with timedelta64). > > I think the only thing that works as of now for dates and deltas is > using datetime.datetime and datetime.timedelta objects in the > initilization of the arrays. See > http://projects.scipy.org/numpy/ticket/1225 for some tests. Oh yes, I saw that... Marty Fuhry, one of our GSoC students, had written some pretty extensive series of tests to allocate datetime/ strings to elements of a ndarray with datetime64 dtype. He also had written some functions allowing conversion from one frequency to another. Unfortunately, I don't think his work has been incorporated yet. Maybe Jarrod M. and Travis O. will shed some light on that matter. I for one would be quite interested into checking what's happening on that front. In other more personal news: Ariel, I gonna be quite busy for the next couple of weeks, but we should chat off-list about our parallel efforts with time series (I still haven't found the time to delve into nipy, could you point me tothe most relevant part). From pav+sp at iki.fi Fri Oct 30 05:36:06 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Fri, 30 Oct 2009 09:36:06 +0000 (UTC) Subject: [Numpy-discussion] [RFC] complex functions in npymath Message-ID: Hi (esp. David), If there are no objections, I'd like to move Numpy's complex-valued C99-like functions to npymath: http://github.com/pv/numpy-work/tree/npymath-complex This'll come useful if we want to start eg. writing Ufuncs in Cython. I'm working around possible compiler-incompatibilities of struct return values by having only pointer versions of the functions in libnpymath.a, and the non-pointer versions as inlined static functions. Also, perhaps we should add a header file npy_math_c99compat.h that would detect if the compiler supports C99, and if not, substitute the C99 functions with our npy_math implementations. This'd be great for scipy.special. -- Pauli Virtanen From david at ar.media.kyoto-u.ac.jp Fri Oct 30 05:34:07 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 30 Oct 2009 18:34:07 +0900 Subject: [Numpy-discussion] [RFC] complex functions in npymath In-Reply-To: References: Message-ID: <4AEAB30F.8090604@ar.media.kyoto-u.ac.jp> Hi Pauli, Pauli Virtanen wrote: > Hi (esp. David), > > If there are no objections, I'd like to move Numpy's complex-valued > C99-like functions to npymath: > > http://github.com/pv/numpy-work/tree/npymath-complex > > This'll come useful if we want to start eg. writing Ufuncs in Cython. > Actually, I am in the process of cleaning my numpy branches for review, and intend to push them into svn as fast as possible. Complex is pretty high on the list. The missing piece in complex support in npymath is mostly tests: I have tests for all the special cases (all special cases specified in C99 standard are tested), but no test for the actual 'normal' values. If you (or someone else) could add a couple of tests, that would be great. > I'm working around possible compiler-incompatibilities of struct > return values by having only pointer versions of the functions in > libnpymath.a, and the non-pointer versions as inlined static > functions. > Is this a problem if we guarantee that our complex type is bit compatible with C99 complex (e.g. casting a complex to a double[2] should alway work) ? That's how the complex math is implemented ATM. > Also, perhaps we should add a header file > > npy_math_c99compat.h > > that would detect if the compiler supports C99, and if not, > substitute the C99 functions with our npy_math implementations. > This'd be great for scipy.special. > I am not sure I understand this: currently, if a given complex function is detected on the platform, npy_foo is just an alias to foo, so we use the platform implementation whenever possible. cheers, David From pav+sp at iki.fi Fri Oct 30 06:07:31 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Fri, 30 Oct 2009 10:07:31 +0000 (UTC) Subject: [Numpy-discussion] [RFC] complex functions in npymath References: <4AEAB30F.8090604@ar.media.kyoto-u.ac.jp> Message-ID: Fri, 30 Oct 2009 18:34:07 +0900, David Cournapeau wrote: [clip] > Actually, I am in the process of cleaning my numpy branches for review, > and intend to push them into svn as fast as possible. Complex is pretty > high on the list. Great! > The missing piece in complex support in npymath is mostly tests: I have > tests for all the special cases (all special cases specified in C99 > standard are tested), but no test for the actual 'normal' values. If you > (or someone else) could add a couple of tests, that would be great. I can probably take a shot at this. >> I'm working around possible compiler-incompatibilities of struct return >> values by having only pointer versions of the functions in >> libnpymath.a, and the non-pointer versions as inlined static functions. > > Is this a problem if we guarantee that our complex type is bit > compatible with C99 complex (e.g. casting a complex to a double[2] > should alway work) ? > > That's how the complex math is implemented ATM. Correct me if I'm wrong, but I think the problem is that for typedef struct foo foo_t; foo_t bar(); different compilers may put the return value of bar() to a different place (registers vs. memory). If we put those functions in a library, and switch compilers, I think the behavior is undefined as there seems to be no standard. I don't think functions in C can return arrays, so double[2] representation probably does not help us here. >> Also, perhaps we should add a header file >> >> npy_math_c99compat.h >> >> that would detect if the compiler supports C99, and if not, substitute >> the C99 functions with our npy_math implementations. This'd be great >> for scipy.special. > > I am not sure I understand this: currently, if a given complex function > is detected on the platform, npy_foo is just an alias to foo, so we use > the platform implementation whenever possible. I'd like to write code like this: coshf(a) + sinhf(b) and not like this: npy_coshf(a) + npy_sinhf(b) This seems easy to achieve with a convenience header that substitutes the C99 functions with npy_ functions when C99 is not available. Pauli From david at ar.media.kyoto-u.ac.jp Fri Oct 30 05:57:12 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 30 Oct 2009 18:57:12 +0900 Subject: [Numpy-discussion] [RFC] complex functions in npymath In-Reply-To: References: <4AEAB30F.8090604@ar.media.kyoto-u.ac.jp> Message-ID: <4AEAB878.4030308@ar.media.kyoto-u.ac.jp> Pauli Virtanen wrote: > > I can probably take a shot at this. > Cool. > Correct me if I'm wrong, but I think the problem is that for > > typedef struct foo foo_t; > foo_t bar(); > You're right, I was thinking about alignment issues myself - that's why I mentioned npy_complex and double[2] being equivalent, as defining complex with a struct does not guarantee this. > different compilers may put the return value of bar() to a different > place (registers vs. memory). If we put those functions in a library, and > switch compilers, I think the behavior is undefined as there seems to be > no standard. > Is this a problem in practice ? If two compilers differ in this, wouldn't they have incompatible ABI ? David From david at ar.media.kyoto-u.ac.jp Fri Oct 30 06:01:34 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 30 Oct 2009 19:01:34 +0900 Subject: [Numpy-discussion] [RFC] complex functions in npymath In-Reply-To: References: <4AEAB30F.8090604@ar.media.kyoto-u.ac.jp> Message-ID: <4AEAB97E.4030902@ar.media.kyoto-u.ac.jp> Pauli Virtanen wrote: > I'd like to write code like this: > > coshf(a) + sinhf(b) > > and not like this: > > npy_coshf(a) + npy_sinhf(b) > Using npy_ prefix was a consciously designed feature :) I would prefer avoid doing this, as it may cause trouble: sometimes, even if the foo function is available, we may want to use npy_foo because it is better, faster, more standard compliant. For example, I remember that the few complex functions on Visual Studio are broken, so even though they are detected, I have a MSVC ifdef to use our own in that case. David From david at ar.media.kyoto-u.ac.jp Fri Oct 30 06:04:13 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 30 Oct 2009 19:04:13 +0900 Subject: [Numpy-discussion] [RFC] new function for floating point comparison In-Reply-To: <3d375d730910290851k12df5db7v26c3c479d8b1efd6@mail.gmail.com> References: <4AE94176.8080804@ar.media.kyoto-u.ac.jp> <3d375d730910290851k12df5db7v26c3c479d8b1efd6@mail.gmail.com> Message-ID: <4AEABA1D.5070802@ar.media.kyoto-u.ac.jp> Robert Kern wrote: > > That sounds good. Another worthwhile addition would be nextafter(). > > http://www.opengroup.org/onlinepubs/000095399/functions/nextafter.html > Ah, I did not know about this one. I have implemented it and committed it. One issue is that it will cause failures on platforms without nextafterl and where long double != double, but I don't think we have a lot of those, if any, David From seb.haase at gmail.com Fri Oct 30 07:04:36 2009 From: seb.haase at gmail.com (Sebastian Haase) Date: Fri, 30 Oct 2009 12:04:36 +0100 Subject: [Numpy-discussion] type 'numpy.int64' unhashable Message-ID: Hi, I get this error: set(chainsA[0,:,0]) TypeError: unhashable type: 'numpy.ndarray' >>> list(chainsA[0,:,0]) [2636, 2590, 2619, 2590] >>> list(chainsA[0,:,0])[0] 2636 >>> type(_) I understand where this error comes from, however what I was trying to do seems to "intuitive" that I would like to ask for suggestions: "What should I do if the "number" 2636 becomes unhashable ?" Thanks, Sebastian Haase From cournape at gmail.com Fri Oct 30 07:21:16 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 30 Oct 2009 20:21:16 +0900 Subject: [Numpy-discussion] type 'numpy.int64' unhashable In-Reply-To: References: Message-ID: <5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com> On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase wrote: > I understand where this error comes from, however what I was trying to > do seems to "intuitive" that I would like to ask for suggestions: > "What should I do if the "number" 2636 becomes unhashable ?" In your example, that's the array which is unhashable, the numbers itself should be hashable. Arrays are mutable, so I don't think you can easily make them hashable. You could transform everything into tuple of tuple of... if you need to use set, though. David From gael.varoquaux at normalesup.org Fri Oct 30 07:23:52 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 30 Oct 2009 12:23:52 +0100 Subject: [Numpy-discussion] type 'numpy.int64' unhashable In-Reply-To: <5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com> References: <5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com> Message-ID: <20091030112352.GD16315@phare.normalesup.org> On Fri, Oct 30, 2009 at 08:21:16PM +0900, David Cournapeau wrote: > On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase wrote: > > I understand where this error comes from, however what I was trying to > > do seems to "intuitive" that I would like to ask for suggestions: > > "What should I do if the "number" 2636 becomes unhashable ?" > In your example, that's the array which is unhashable, the numbers > itself should be hashable. Arrays are mutable, so I don't think you > can easily make them hashable. You could transform everything into > tuple of tuple of... if you need to use set, though. Use md5's of their .data attribute. This works quite well (you might want to hash a pickled string of the dtype in addition). Ga?l From bergstrj at iro.umontreal.ca Fri Oct 30 09:11:35 2009 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Fri, 30 Oct 2009 09:11:35 -0400 Subject: [Numpy-discussion] type 'numpy.int64' unhashable In-Reply-To: <20091030112352.GD16315@phare.normalesup.org> References: <5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com> <20091030112352.GD16315@phare.normalesup.org> Message-ID: <7f1eaee30910300611y6d6a8b3fg7e266671a93eb49@mail.gmail.com> On Fri, Oct 30, 2009 at 7:23 AM, Gael Varoquaux wrote: > On Fri, Oct 30, 2009 at 08:21:16PM +0900, David Cournapeau wrote: >> On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase wrote: > >> > I understand where this error comes from, however what I was trying to >> > do seems to "intuitive" that I would like to ask for suggestions: >> > "What should I do if the "number" 2636 becomes unhashable ?" > >> In your example, that's the array which is unhashable, the numbers >> itself should be hashable. Arrays are mutable, so I don't think you >> can easily make them hashable. You could transform everything into >> tuple of tuple of... if you need to use set, though. > > Use md5's of their .data attribute. This works quite well (you might want > to hash a pickled string of the dtype in addition). > > Ga?l Careful... if your data is not contiguous in memory then you could be adding lots of random noise to your hash key by doing this. This could cause equal ndarrays to hash to different values -- not good. Make sure memory is contiguous before hashing the .data. Flatten() does this i think, as does copy(), array(), and many others. James -- http://www-etud.iro.umontreal.ca/~bergstrj From mail at stevesimmons.com Fri Oct 30 09:18:05 2009 From: mail at stevesimmons.com (Stephen Simmons) Date: Fri, 30 Oct 2009 14:18:05 +0100 Subject: [Numpy-discussion] Designing a new storage format for numpy recarrays Message-ID: <4AEAE78D.6030309@stevesimmons.com> Hi, Is anyone working on alternative storage options for numpy arrays, and specifically recarrays? My main application involves processing series of large recarrays (say 1000 recarrays, each with 5M rows having 50 fields). Existing options meet some but not all of my requirements. Requirements -------------- The basic requirements are: Mandatory - fast - suitable for very large arrays (larger than can fit in memory) - compressed (to reduce disk space, read data more quickly) - seekable (can read subset of data without decompressing everything) - can append new data to an existing file - able to extract individual fields from a recarray (for when indexing or processing needs just a few fields) Nice to have - files can be split without decompressing and recompressing (e.g. distribute processing over a grid) - encryption, ideally field-level, with encryption occurring after compression - can store multiple arrays in one physical file (convenience) - portable/stardard/well documented Existing options ----------------- Over the last few years I've tried most of numpy's options for saving arrays to disk, including pickles, .npy, .npz, memmap-ed files and HDF (Pytables). None of these is perfect, although Pytables comes close: - .npy - not compressed, need to read whole array into memory - .npz - compressed but ZLIB compression is too slow - memmap - not compressed - Pytables (HDF using chunked storage for recarrays with LZO compression and shuffle filter) - can't extract individual field from a recarray - multiple dependencies (HDF, PyTables+LZO, Pyh5+LZF) - HDF is standard but LZO implementation is specific to Pytables (similarly LZF is specific to Pyh5) Are there any other options? Thoughts about a new format -------------------------------- It seems that numpy could benefit from a new storage format. My first thoughts involve: - Use chunked format - split big arrays into pages of consecutive rows, compressed separately - Get good compression ratios by shuffling data before compressing (byte 1 of all rows, then byte 2 of all rows, ...) - Get efficient access to individual fields in recarrays by compressing each recarray field's data separately (shuffling has nice side-effect of separating recarray fields' data) - Make it fast to compress and decompress by using LZO - Store pages of rows (and compressd field data within a page) using a numpy variation of IFF chunked format (e.g. used by the DjVu scanned document format version 3). For example, FORM chunk for whole file, DTYP chunk for dtype info, DIRM chunk for directory to pages holding rows, NPAG chunk for a page - The IFF structure of named chunk types allows format to be extended (other compressors than LZO, encryption, links to remote data chunks, etc) I'd appreciate any comments or suggestions before I start coding. References ----------- DjVu format - http://djvu.org/resources/ DjVu v3 format - http://djvu.org/docs/DjVu3Spec.djvu Stephen P.S. Maybe this will be too much work, and I'd be better off sticking with Pytables..... From dagss at student.matnat.uio.no Fri Oct 30 09:48:54 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 30 Oct 2009 14:48:54 +0100 Subject: [Numpy-discussion] Designing a new storage format for numpy recarrays In-Reply-To: <4AEAE78D.6030309@stevesimmons.com> References: <4AEAE78D.6030309@stevesimmons.com> Message-ID: <2384af0be6bd9de94f7f140ea5a6aca3.squirrel@webmail.uio.no> Dag Sverre Seljebotn: > Hi, > > Is anyone working on alternative storage options for numpy arrays, and > specifically recarrays? My main application involves processing series > of large recarrays (say 1000 recarrays, each with 5M rows having 50 > fields). Existing options meet some but not all of my requirements. > > Requirements > -------------- > The basic requirements are: > > Mandatory > - fast > - suitable for very large arrays (larger than can fit in memory) > - compressed (to reduce disk space, read data more quickly) > - seekable (can read subset of data without decompressing everything) > - can append new data to an existing file > - able to extract individual fields from a recarray (for when indexing > or processing needs just a few fields) > Nice to have > - files can be split without decompressing and recompressing (e.g. > distribute processing over a grid) > - encryption, ideally field-level, with encryption occurring after > compression > - can store multiple arrays in one physical file (convenience) > - portable/stardard/well documented > > Existing options > ----------------- > Over the last few years I've tried most of numpy's options for saving > arrays to disk, including pickles, .npy, .npz, memmap-ed files and HDF > (Pytables). > > None of these is perfect, although Pytables comes close: > - .npy - not compressed, need to read whole array into memory > - .npz - compressed but ZLIB compression is too slow > - memmap - not compressed > - Pytables (HDF using chunked storage for recarrays with LZO > compression and shuffle filter) > - can't extract individual field from a recarray I'm just learning PyTables so I'm curious about this... if I use a normal Table, it will be presented as a NumPy record array when I access it, and I can access individual fields. What are the disadvantages to that? > - multiple dependencies (HDF, PyTables+LZO, Pyh5+LZF) (I think this is a pro, not a con: It means that there's a lot of already bugfixed code being used. Any codebase is only as strong as the number of eyes on it.) Dag Sverre From dagss at student.matnat.uio.no Fri Oct 30 10:08:20 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 30 Oct 2009 15:08:20 +0100 Subject: [Numpy-discussion] Designing a new storage format for numpy recarrays In-Reply-To: <4AEAE78D.6030309@stevesimmons.com> References: <4AEAE78D.6030309@stevesimmons.com> Message-ID: <834a4c20c8c7923cc132386169bbdd2a.squirrel@webmail.uio.no> Stephen Simmons wrote: > P.S. Maybe this will be too much work, and I'd be better off sticking > with Pytables..... I can't judge that, but I want to share some thoughts (rant?): - Are you ready to not only write the code, but maintain it over years to come, and work through nasty bugs, and think things through when people ask for parallellism or obscure filesystem locking functionality or whatnot? - Are you ready to finish even the last, boring "10%". Since there are existing options in the same area you can't expect a growing userbase to help you with the last "10%" (unlike projects in unexplored areas). - When you are done, are you sure that what you finally have will really be leaner and easier to work with than the existing options (like PyTables?). If not, odds are the result will in the end only be used by yourself. Simply writing the prototype is the easy part of the job! Perhaps needless to say, my hunch would be to try to work with PyTables to add what you miss there. There's a harder learning curve than writing something from scratch, but not harder than what others will have with something you write from scratch. The advantage of hdf5 is that there's lot of existing tools for inspecting, processing and sharing the data independent of NumPy (well, up to propriotary compression; but that's hardly worse than the entire format being propriotary). Dag Sverre From zachary.pincus at yale.edu Fri Oct 30 10:26:21 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Fri, 30 Oct 2009 10:26:21 -0400 Subject: [Numpy-discussion] Designing a new storage format for numpy recarrays In-Reply-To: <834a4c20c8c7923cc132386169bbdd2a.squirrel@webmail.uio.no> References: <4AEAE78D.6030309@stevesimmons.com> <834a4c20c8c7923cc132386169bbdd2a.squirrel@webmail.uio.no> Message-ID: Unless I read your request or the documentation wrong, h5py already supports pulling specific fields out of "compound data types": http://h5py.alfven.org/docs-1.1/guide/hl.html#id3 > For compound data, you can specify multiple field names alongside > the numeric slices: > >>> dset["FieldA"] > >>> dset[0,:,4:5, "FieldA", "FieldB"] > >>> dset[0, ..., "FieldC"] Is this latter style of access what you were asking for? (Or is the problem that it's not fast enough in hdf5, even with the shuffle filter, etc?) So then the issue is that there's a dependency on hdf5 and h5py? (or if you want to access LZF-compressed files without h5py, a dependency on hdf5 and the C LZF compressor?). This is pretty lightweight, especially if you're proposing writing new code which itself would be a dependency. So your new code couldn't depend on *anything* else if you wanted it to be a fewer-dependencies option than hdf5+h5py, right? Zach From faltet at pytables.org Fri Oct 30 11:17:08 2009 From: faltet at pytables.org (Francesc Alted) Date: Fri, 30 Oct 2009 16:17:08 +0100 Subject: [Numpy-discussion] Designing a new storage format for numpy recarrays In-Reply-To: <4AEAE78D.6030309@stevesimmons.com> References: <4AEAE78D.6030309@stevesimmons.com> Message-ID: <200910301617.08823.faltet@pytables.org> A Friday 30 October 2009 14:18:05 Stephen Simmons escrigu?: > - Pytables (HDF using chunked storage for recarrays with LZO > compression and shuffle filter) > - can't extract individual field from a recarray Er... Have you tried the ``cols`` accessor? http://www.pytables.org/docs/manual/ch04.html#ColsClassDescr Cheers, -- Francesc Alted From robert.kern at gmail.com Fri Oct 30 12:09:26 2009 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 30 Oct 2009 11:09:26 -0500 Subject: [Numpy-discussion] Designing a new storage format for numpy recarrays In-Reply-To: <4AEAE78D.6030309@stevesimmons.com> References: <4AEAE78D.6030309@stevesimmons.com> Message-ID: <3d375d730910300909h4e3beb8v4fe582a2cccd27ff@mail.gmail.com> On Fri, Oct 30, 2009 at 08:18, Stephen Simmons wrote: > Thoughts about a new format > -------------------------------- > It seems that numpy could benefit from a new storage format. While you may indeed need a new format, I'm not sure that numpy does. Lord knows I've gotten enough flak for inventing yet another binary format with .npy. :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From mail at stevesimmons.com Fri Oct 30 12:19:42 2009 From: mail at stevesimmons.com (Stephen Simmons) Date: Fri, 30 Oct 2009 17:19:42 +0100 Subject: [Numpy-discussion] Designing a new storage format for numpy recarrays In-Reply-To: <200910301617.08823.faltet@pytables.org> References: <4AEAE78D.6030309@stevesimmons.com> <200910301617.08823.faltet@pytables.org> Message-ID: <4AEB121E.8080607@stevesimmons.com> I should clarify what I meant...... Suppose I have a recarray with 50 fields and want to read just one of those fields. PyTables/HDF will read in the compressed data for chunks of complete rows, decompress the full 50 fields, and then give me back the data for just one field. I'm after a solution where asking for a single field reads in the bytes for just that field from disk and decompresses it. This is similar to the difference between databases storing their data as rows or columns. See for example Mike Stonebraker's C-store column-oriented database (http://db.lcs.mit.edu/projects/cstore/vldb.pdf). Stephen Francesc Alted wrote: > A Friday 30 October 2009 14:18:05 Stephen Simmons escrigu?: > >> - Pytables (HDF using chunked storage for recarrays with LZO >> compression and shuffle filter) >> - can't extract individual field from a recarray >> > > Er... Have you tried the ``cols`` accessor? > > http://www.pytables.org/docs/manual/ch04.html#ColsClassDescr > > Cheers, > > From peridot.faceted at gmail.com Fri Oct 30 12:35:10 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Fri, 30 Oct 2009 12:35:10 -0400 Subject: [Numpy-discussion] Designing a new storage format for numpy recarrays In-Reply-To: <4AEB121E.8080607@stevesimmons.com> References: <4AEAE78D.6030309@stevesimmons.com> <200910301617.08823.faltet@pytables.org> <4AEB121E.8080607@stevesimmons.com> Message-ID: 2009/10/30 Stephen Simmons : > I should clarify what I meant...... > > Suppose I have a recarray with 50 fields and want to read just one of > those fields. PyTables/HDF will read in the compressed data for chunks > of complete rows, decompress the full 50 fields, and then give me back > the data for just one field. > > I'm after a solution where asking for a single field reads in the bytes > for just that field from disk and decompresses it. > > This is similar to the difference between databases storing their data > as rows or columns. See for example Mike Stonebraker's C-store > column-oriented database (http://db.lcs.mit.edu/projects/cstore/vldb.pdf). Is there any reason not to simply store the data as a collection of separate arrays, one per column? It shouldn't be too hard to write a wrapper to give this nicer syntax, while implementing it under the hood with HDF5... Anne > Stephen > > > > Francesc Alted wrote: >> A Friday 30 October 2009 14:18:05 Stephen Simmons escrigu?: >> >>> ?- Pytables (HDF using chunked storage for recarrays with LZO >>> compression and shuffle filter) >>> ? ? - can't extract individual field from a recarray >>> >> >> Er... Have you tried the ``cols`` accessor? >> >> http://www.pytables.org/docs/manual/ch04.html#ColsClassDescr >> >> Cheers, >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Fri Oct 30 12:44:11 2009 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 30 Oct 2009 11:44:11 -0500 Subject: [Numpy-discussion] type 'numpy.int64' unhashable In-Reply-To: <7f1eaee30910300611y6d6a8b3fg7e266671a93eb49@mail.gmail.com> References: <5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com> <20091030112352.GD16315@phare.normalesup.org> <7f1eaee30910300611y6d6a8b3fg7e266671a93eb49@mail.gmail.com> Message-ID: <3d375d730910300944y3dbe2aefwd14c582f5aa0bbcc@mail.gmail.com> On Fri, Oct 30, 2009 at 08:11, James Bergstra wrote: > On Fri, Oct 30, 2009 at 7:23 AM, Gael Varoquaux > wrote: >> On Fri, Oct 30, 2009 at 08:21:16PM +0900, David Cournapeau wrote: >>> On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase wrote: >> >>> > I understand where this error comes from, however what I was trying to >>> > do seems to "intuitive" that I would like to ask for suggestions: >>> > "What should I do if the "number" 2636 becomes unhashable ?" >> >>> In your example, that's the array which is unhashable, the numbers >>> itself should be hashable. Arrays are mutable, so I don't think you >>> can easily make them hashable. You could transform everything into >>> tuple of tuple of... if you need to use set, though. >> >> Use md5's of their .data attribute. This works quite well (you might want >> to hash a pickled string of the dtype in addition). >> >> Ga?l > > Careful... if your data is not contiguous in memory then you could be > adding lots of random noise to your hash key by doing this. ?This > could cause equal ndarrays to hash to different values -- not good. > Make sure memory is contiguous before hashing the .data. ?Flatten() > does this i think, as does copy(), array(), and many others. .data doesn't work for non-contiguous arrays anyways. :-) But all of this is irrelevant to the OP. First, I cannot replicate his problem. In [12]: chainsA = np.arange(10, dtype=np.int64) In [13]: set(chainsA) Out[13]: set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Second, he seems to be interested in scalar objects, not arrays. The scalar objects should all be hashable and comparable out-of-box and ready to be used in sets and as dict keys. We will need a complete, self-contained example that demonstrates the problem to get any further with this. Third, even if he wanted to use arrays as set elements, he couldn't because such objects not only need to have __hash__ defined, they also need __eq__ to return a bool. We return boolean arrays that cannot be used as a truth value. Fourth, even if arrays could be compared, you couldn't replace their __hash__ method or tell set to use a different function in place of the __hash__ method. Fifth, even if you could tell set to use a different hash function, you wouldn't use cryptographic hashes. You would just hash(buffer(arr)) for contiguous arrays and hash(arr.tostring()) for the rest. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jorgen.stenarson at bostream.nu Fri Oct 30 13:22:56 2009 From: jorgen.stenarson at bostream.nu (=?ISO-8859-1?Q?J=F6rgen_Stenarson?=) Date: Fri, 30 Oct 2009 18:22:56 +0100 Subject: [Numpy-discussion] [RFC] complex functions in npymath In-Reply-To: <4AEAB30F.8090604@ar.media.kyoto-u.ac.jp> References: <4AEAB30F.8090604@ar.media.kyoto-u.ac.jp> Message-ID: <4AEB20F0.8070201@bostream.nu> David Cournapeau skrev: > The missing piece in complex support in npymath is mostly tests: I have > tests for all the special cases (all special cases specified in C99 > standard are tested), but no test for the actual 'normal' values. If you > (or someone else) could add a couple of tests, that would be great. > In ticket #1271 I reported on some edgecase errors for pow with negative exponents which would be great to include in the test suite as well. /J?rgen From reckoner at gmail.com Fri Oct 30 14:13:28 2009 From: reckoner at gmail.com (Reckoner) Date: Fri, 30 Oct 2009 11:13:28 -0700 Subject: [Numpy-discussion] persistent ImportError: No module named multiarray when moving cPickle files between machines Message-ID: Hi, % python -c 'import numpy.core.multiarray' works just fine, but when I try to load a file that I have transferred from another machine running Windows to one running Linux, I get: % python -c 'import cPickle;a=cPickle.load(open("matrices.pkl"))' Traceback (most recent call last): File "", line 1, in ImportError: No module named multiarray otherwise, cPickle works normally when transferring files that *do* not contain numpy arrays. I am using version 1.2 on both machines. It's not so easy for me to change versions, by the way, since this is the version that my working group has decided on to standardize on for this effort. Any help appreciated. From robert.kern at gmail.com Fri Oct 30 15:09:48 2009 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 30 Oct 2009 14:09:48 -0500 Subject: [Numpy-discussion] persistent ImportError: No module named multiarray when moving cPickle files between machines In-Reply-To: References: Message-ID: <3d375d730910301209y5a59c472v86d62c7fb517d77b@mail.gmail.com> On Fri, Oct 30, 2009 at 13:13, Reckoner wrote: > Hi, > > % python -c 'import numpy.core.multiarray' > > works just fine, but when I try to load a file that I have transferred > from another machine running Windows to one running Linux, I get: > > % ?python -c 'import cPickle;a=cPickle.load(open("matrices.pkl"))' > > Traceback (most recent call last): > ?File "", line 1, in > ImportError: No module named multiarray > > otherwise, cPickle works normally when transferring files that *do* > not contain numpy arrays. > > I am using version 1.2 on both machines. It's not so easy for me to > change versions, by the way, since this is the version that my working > group has decided on to standardize on for this effort. You can import numpy.core.multiarray on both machines? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pav at iki.fi Fri Oct 30 17:05:16 2009 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 30 Oct 2009 23:05:16 +0200 Subject: [Numpy-discussion] [RFC] complex functions in npymath In-Reply-To: <4AEAB878.4030308@ar.media.kyoto-u.ac.jp> References: <4AEAB30F.8090604@ar.media.kyoto-u.ac.jp> <4AEAB878.4030308@ar.media.kyoto-u.ac.jp> Message-ID: <1256936716.6755.14.camel@idol> pe, 2009-10-30 kello 18:57 +0900, David Cournapeau kirjoitti: [clip: struct return values] > Is this a problem in practice ? If two compilers differ in this, > wouldn't they have incompatible ABI ? Yep, it would be an incompatible ABI. I don't really know how common this in practice -- but there was a comment warning about this in the old ufunc sources, so I wanted to be wary... I don't think there's a significant downside in having thin wrappers around the pointer functions. Googling a bit reveals at least some issues that have cropped up in gcc: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36834 (MSVC vs. gcc) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9506 (bug on freebsd) I'd imagine the situation vs. compilers is here a bit similar to C++ ABIs and sounds like it's a less tested corner of the calling conventions. No idea whether this matters in practice, but at least the above MSVC vs. gcc issue sounds like it might bite. Pauli From seb.haase at gmail.com Fri Oct 30 17:08:38 2009 From: seb.haase at gmail.com (Sebastian Haase) Date: Fri, 30 Oct 2009 22:08:38 +0100 Subject: [Numpy-discussion] type 'numpy.int64' unhashable In-Reply-To: <3d375d730910300944y3dbe2aefwd14c582f5aa0bbcc@mail.gmail.com> References: <5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com> <20091030112352.GD16315@phare.normalesup.org> <7f1eaee30910300611y6d6a8b3fg7e266671a93eb49@mail.gmail.com> <3d375d730910300944y3dbe2aefwd14c582f5aa0bbcc@mail.gmail.com> Message-ID: On Fri, Oct 30, 2009 at 5:44 PM, Robert Kern wrote: > On Fri, Oct 30, 2009 at 08:11, James Bergstra wrote: >> On Fri, Oct 30, 2009 at 7:23 AM, Gael Varoquaux >> wrote: >>> On Fri, Oct 30, 2009 at 08:21:16PM +0900, David Cournapeau wrote: >>>> On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase wrote: >>> >>>> > I understand where this error comes from, however what I was trying to >>>> > do seems to "intuitive" that I would like to ask for suggestions: >>>> > "What should I do if the "number" 2636 becomes unhashable ?" >>> >>>> In your example, that's the array which is unhashable, the numbers >>>> itself should be hashable. Arrays are mutable, so I don't think you >>>> can easily make them hashable. You could transform everything into >>>> tuple of tuple of... if you need to use set, though. >>> >>> Use md5's of their .data attribute. This works quite well (you might want >>> to hash a pickled string of the dtype in addition). >>> >>> Ga?l >> >> Careful... if your data is not contiguous in memory then you could be >> adding lots of random noise to your hash key by doing this. ?This >> could cause equal ndarrays to hash to different values -- not good. >> Make sure memory is contiguous before hashing the .data. ?Flatten() >> does this i think, as does copy(), array(), and many others. > > .data doesn't work for non-contiguous arrays anyways. :-) > > But all of this is irrelevant to the OP. First, I cannot replicate his problem. > > In [12]: chainsA = np.arange(10, dtype=np.int64) > > In [13]: set(chainsA) > Out[13]: set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > > Second, he seems to be interested in scalar objects, not arrays. The > scalar objects should all be hashable and comparable out-of-box and > ready to be used in sets and as dict keys. We will need a complete, > self-contained example that demonstrates the problem to get any > further with this. > > Third, even if he wanted to use arrays as set elements, he couldn't > because such objects not only need to have __hash__ defined, they also > need __eq__ to return a bool. We return boolean arrays that cannot be > used as a truth value. > > Fourth, even if arrays could be compared, you couldn't replace their > __hash__ method or tell set to use a different function in place of > the __hash__ method. > > Fifth, even if you could tell set to use a different hash function, > you wouldn't use cryptographic hashes. You would just > hash(buffer(arr)) for contiguous arrays and hash(arr.tostring()) for > the rest. > > -- > Robert Kern > Thanks to everyone for replying. Nice detective work, Robert - indeed it seems to work with "real" ndarrays -- I have to do some more homework to get my problem into a shape so that I could demonstrate it in a "small, self contained form". Thanks again, Sebastian From reckoner at gmail.com Fri Oct 30 21:48:23 2009 From: reckoner at gmail.com (Reckoner) Date: Fri, 30 Oct 2009 18:48:23 -0700 Subject: [Numpy-discussion] persistent ImportError: No module named multiarray when moving cPickle files between machines In-Reply-To: References: Message-ID: > Robert Kern wrote: > You can import numpy.core.multiarray on both machines? Yes. For each machine separately, you can cPickle files with numpy arrays without problems loading/dumping. The problem comes from transferring the win32 cPickle'd files to Linux 64 bit and then trying to load them. Transferring cPickle'd files that do *not* have numpy arrays work as expected. In other words, cPICKLE'd lists transfer fine back and forth between the two machines. In fact, we currently get around this problem by converting the numpy arrays to lists, transferring them, and then re-numpy-ing them on the respective hosts thanks. On Fri, Oct 30, 2009 at 11:13 AM, Reckoner wrote: > Hi, > > % python -c 'import numpy.core.multiarray' > > works just fine, but when I try to load a file that I have transferred > from another machine running Windows to one running Linux, I get: > > % ?python -c 'import cPickle;a=cPickle.load(open("matrices.pkl"))' > > Traceback (most recent call last): > ?File "", line 1, in > ImportError: No module named multiarray > > otherwise, cPickle works normally when transferring files that *do* > not contain numpy arrays. > > I am using version 1.2 on both machines. It's not so easy for me to > change versions, by the way, since this is the version that my working > group has decided on to standardize on for this effort. > > > Any help appreciated. > From sccolbert at gmail.com Sat Oct 31 08:22:47 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sat, 31 Oct 2009 13:22:47 +0100 Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array become? Message-ID: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com> For example say we have an original array a=np.random.random((512, 512, 3)) and we take a slice of that array b=a[:100, :100, :] now, b is discontiguous, but all if its memory is owned by a. Will there ever be a situation where a discontiguous array owns its own data? Or more generally, will discontiguous data alway have a contiguous parent? As far as i understand the numpy strided model, that could only be supported if len(strides) = ndim+1, I dont think numpy supports that. Don't get me wrong, I'm not making a feature request, just making sure I fully understand the array model so I can avoid trampling on memory. Cheers! Chris From cournape at gmail.com Sat Oct 31 08:32:17 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 31 Oct 2009 21:32:17 +0900 Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array become? In-Reply-To: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com> References: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com> Message-ID: <5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com> On Sat, Oct 31, 2009 at 9:22 PM, Chris Colbert wrote: > > Will there ever be a situation where a discontiguous array owns its > own data? Or more generally, will discontiguous data alway have a > contiguous parent? Yes to Q1 and No to Q2. Discontiguous arrays are very easy to create: for example, if you say np.empty((10, 50), order="F"), you have a discontiguous array. I use this quite often when I need to interoperate with C or Fortran libraries - interoperation with other libraries/formats is another common source of discontiguous arrays, compared to memory views. David From sccolbert at gmail.com Sat Oct 31 08:45:19 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sat, 31 Oct 2009 13:45:19 +0100 Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array become? In-Reply-To: <5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com> References: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com> <5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com> Message-ID: <7f014ea60910310545v6b90343ey4f737612fdbf930d@mail.gmail.com> Thanks for the response david. Lemme rephrase the question a little bit. It terms of actually memory space, will a numpy array ever point to a chunk of memory that is not a continually running series of memory addresses and also not a child of a continuous block of addresses. Graphically can this every occur in hardware memory: |--- a portion of array A ---|--- python object foo ---|--- The rest of array A ----| The reason I ask is because I am passing numpy arrays into another library which uses a strided memory model, but not FULLY strided, and I need to figure out what checks I need to put in place to ensure that it doesnt trample on memory. In the best case senario, it would just trample on the parent array, in the worst case senario it would segfault. Cheers, Chris On Sat, Oct 31, 2009 at 1:32 PM, David Cournapeau wrote: > On Sat, Oct 31, 2009 at 9:22 PM, Chris Colbert wrote: > >> >> Will there ever be a situation where a discontiguous array owns its >> own data? Or more generally, will discontiguous data alway have a >> contiguous parent? > > Yes to Q1 and No to Q2. > > Discontiguous arrays are very easy to create: for example, if you say > np.empty((10, 50), order="F"), you have a discontiguous array. I use > this quite often when I need to interoperate with C or Fortran > libraries - interoperation with other libraries/formats is another > common source of discontiguous arrays, compared to memory views. > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Sat Oct 31 08:58:53 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 31 Oct 2009 21:58:53 +0900 Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array become? In-Reply-To: <7f014ea60910310545v6b90343ey4f737612fdbf930d@mail.gmail.com> References: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com> <5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com> <7f014ea60910310545v6b90343ey4f737612fdbf930d@mail.gmail.com> Message-ID: <5b8d13220910310558t47d77f18me17beeeb9a571ba1@mail.gmail.com> On Sat, Oct 31, 2009 at 9:45 PM, Chris Colbert wrote: > Graphically can this every occur in hardware memory: > > |--- a portion of array A ---|--- python object foo ---|--- The rest > of array A ----| No, this can never happen in the current numpy memory model, the allocated block has to be contiguous, and you can get to any item of the array from the data pointer (address of the first item) by N * item_size. That's a fundamental feature to enable fast access (you only need to jump once). David From sccolbert at gmail.com Sat Oct 31 09:02:14 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sat, 31 Oct 2009 14:02:14 +0100 Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array become? In-Reply-To: <5b8d13220910310558t47d77f18me17beeeb9a571ba1@mail.gmail.com> References: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com> <5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com> <7f014ea60910310545v6b90343ey4f737612fdbf930d@mail.gmail.com> <5b8d13220910310558t47d77f18me17beeeb9a571ba1@mail.gmail.com> Message-ID: <7f014ea60910310602y4b0ea611q43e47be0f7011a40@mail.gmail.com> Great! Thanks for the help David! On Sat, Oct 31, 2009 at 1:58 PM, David Cournapeau wrote: > On Sat, Oct 31, 2009 at 9:45 PM, Chris Colbert wrote: > >> Graphically can this every occur in hardware memory: >> >> |--- a portion of array A ---|--- python object foo ---|--- The rest >> of array A ----| > > No, this can never happen in the current numpy memory model, the > allocated block has to be contiguous, and you can get to any item of > the array from the data pointer (address of the first item) by N * > item_size. That's a fundamental feature to enable fast access (you > only need to jump once). > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Sat Oct 31 12:38:34 2009 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 31 Oct 2009 12:38:34 -0400 Subject: [Numpy-discussion] Recarray comparison and byte order Message-ID: <1e2af89e0910310938n7aec3ca0w8d5b0f8e6c7b0933@mail.gmail.com> Hi, I was surprised by this - is it a bug or a feature or me misunderstanding something? a = np.zeros((1,), dtype=[('f1', 'u2')]) b = a.copy() b == a (array([True], dtype=bool)) # as expected c = a.byteswap().newbyteorder() c == a (False) # to me, unexpected, note bool rather than array Thanks for any clarification... Matthew From geometrian at gmail.com Sat Oct 31 14:46:50 2009 From: geometrian at gmail.com (Ian Mallett) Date: Sat, 31 Oct 2009 11:46:50 -0700 Subject: [Numpy-discussion] Recarray comparison and byte order In-Reply-To: <1e2af89e0910310938n7aec3ca0w8d5b0f8e6c7b0933@mail.gmail.com> References: <1e2af89e0910310938n7aec3ca0w8d5b0f8e6c7b0933@mail.gmail.com> Message-ID: On Sat, Oct 31, 2009 at 9:38 AM, Matthew Brett wrote: > c = a.byteswap().newbyteorder() > c == a > In the last two lines, a variable "c" is assigned to a modified "a". The next line tests (==) to see if "c" is the same as (==) the unmodified "a". It isn't, because "c" is the modified "a". Hence, "False". Do you mean: c = a instead of: c == a ...? HTH, Ian -------------- next part -------------- An HTML attachment was scrubbed... URL: