From gael.varoquaux at normalesup.org Mon Dec 1 03:12:20 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 1 Dec 2008 09:12:20 +0100 Subject: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing In-Reply-To: <5b8d13220811301944k7807d3a2w4fcc821255269053@mail.gmail.com> References: <492D8FD3.8050601@enthought.com> <492DC9B0.1030300@gmail.com> <5b8d13220811301944k7807d3a2w4fcc821255269053@mail.gmail.com> Message-ID: <20081201081220.GC18450@phare.normalesup.org> On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote: > On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale wrote: > > I tried installing 4.0.300x on a machine running 64-bit windows vista home > > edition and ran into problems with PyQt and some related packages. So I > > uninstalled all the python-related software, EPD took over 30 minutes to > > uninstall, and tried to install EPD 4.1 beta. > My guess is that EPD is only 32 bits installer, so that you run it on > WOW (Windows in Windows) on windows 64, which is kind of slow (but > usable for most tasks). On top of that, Vista is not supported with EPD. I had a chat with the EPD guys about that, and they say it does work with Vista... most of the time. They don't really understand the failures, and haven't had time to investigate much, because so far professionals and labs are simply avoiding Vista. Hopefully someone from the EPD team will give a more accurate answer soon. Ga?l From timmichelsen at gmx-topmail.de Mon Dec 1 05:22:10 2008 From: timmichelsen at gmx-topmail.de (Timmie) Date: Mon, 1 Dec 2008 10:22:10 +0000 (UTC) Subject: [Numpy-discussion] optimising single value functions for array calculations Message-ID: Hello, I am developing a module which bases its calculations on another specialised module. My module uses numpy arrays a lot. The problem is that the other module I am building upon, does not work with (whole) arrays but with single values. Therefore, I am currently forces to loop over the array: ### a = numpy.arange(100) b = numpy.arange(100,200) for i in range(0,a.size): a[i] = myfunc(a[i])* b[i] ### The results come out well. But the problem is that this way of calculation is very ineffiecent and takes time. May anyone give me a hint on how I can improve my code without having to modify the package I am building upon. I do not want to change it a lot because I would always have to run behind the chnages in the other package. To summarise: How to I make a calculation function array-aware? Thanks in advance, Timmie From emmanuelle.gouillart at normalesup.org Mon Dec 1 05:28:46 2008 From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart) Date: Mon, 1 Dec 2008 11:28:46 +0100 (CET) Subject: [Numpy-discussion] optimising single value functions for array calculations In-Reply-To: References: Message-ID: <12998.195.68.31.231.1228127326.squirrel@www.normalesup.org> Hello Timmie, numpy.vectorize(myfunc) should do what you want. Cheers, Emmanuelle > Hello, > I am developing a module which bases its calculations > on another specialised module. > My module uses numpy arrays a lot. > The problem is that the other module I am building > upon, does not work with (whole) arrays but with > single values. > Therefore, I am currently forces to loop over the > array: > > ### > a = numpy.arange(100) > b = numpy.arange(100,200) > for i in range(0,a.size): > a[i] = myfunc(a[i])* b[i] > > ### > > The results come out well. But the problem is that this > way of calculation is very ineffiecent and takes time. > > May anyone give me a hint on how I can improve my > code without having to modify the package I am > building upon. I do not want to change it a lot because > I would always have to run behind the chnages in the > other package. > > To summarise: > How to I make a calculation function array-aware? > > Thanks in advance, > Timmie > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From matthieu.brucher at gmail.com Mon Dec 1 05:33:56 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 1 Dec 2008 11:33:56 +0100 Subject: [Numpy-discussion] optimising single value functions for array calculations In-Reply-To: References: Message-ID: 2008/12/1 Timmie : > Hello, > I am developing a module which bases its calculations > on another specialised module. > My module uses numpy arrays a lot. > The problem is that the other module I am building > upon, does not work with (whole) arrays but with > single values. > Therefore, I am currently forces to loop over the > array: > > ### > a = numpy.arange(100) > b = numpy.arange(100,200) > for i in range(0,a.size): > a[i] = myfunc(a[i])* b[i] > > ### Hi, Safe from using numpy functions inside myfunc(), numpy has no way of optimizing your computation. vectorize() will help you to have a clean interface, but it will not enhance speed. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From nadavh at visionsense.com Mon Dec 1 05:37:25 2008 From: nadavh at visionsense.com (Nadav Horesh) Date: Mon, 1 Dec 2008 12:37:25 +0200 Subject: [Numpy-discussion] optimising single value functions for array calculations References: <12998.195.68.31.231.1228127326.squirrel@www.normalesup.org> Message-ID: <710F2847B0018641891D9A216027636029C359@ex3.envision.co.il> I does not solve the slowness problem. I think I read on the list about an experimental code for fast vectorization. Nadav. -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? Emmanuelle Gouillart ????: ? 01-?????-08 12:28 ??: Discussion of Numerical Python ????: Re: [Numpy-discussion] optimising single value functions for array calculations Hello Timmie, numpy.vectorize(myfunc) should do what you want. Cheers, Emmanuelle > Hello, > I am developing a module which bases its calculations > on another specialised module. > My module uses numpy arrays a lot. > The problem is that the other module I am building > upon, does not work with (whole) arrays but with > single values. > Therefore, I am currently forces to loop over the > array: > > ### > a = numpy.arange(100) > b = numpy.arange(100,200) > for i in range(0,a.size): > a[i] = myfunc(a[i])* b[i] > > ### > > The results come out well. But the problem is that this > way of calculation is very ineffiecent and takes time. > > May anyone give me a hint on how I can improve my > code without having to modify the package I am > building upon. I do not want to change it a lot because > I would always have to run behind the chnages in the > other package. > > To summarise: > How to I make a calculation function array-aware? > > Thanks in advance, > Timmie > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3856 bytes Desc: not available URL: From bakker at itc.nl Mon Dec 1 06:31:59 2008 From: bakker at itc.nl (Wim Bakker) Date: Mon, 1 Dec 2008 11:31:59 +0000 (UTC) Subject: [Numpy-discussion] memmap & dtype issue Message-ID: For a long time now, numpy's memmap has me puzzled by its behavior. When I use memmap straightforward on a file it seems to work fine, but whenever I try to do a memmap using a dtype it seems to gobble up the whole file into memory. This, of course, makes the use of memmap futile. I would expect that the result of such an operation would give me a true memmap and that the data would be converted to dtype on the fly. I've seen this behavior in version version 1.04, 1.1.1 and still in 1.2.1. I'm working on Windows haven't tried it on Linux. Am I doing something wrong? Are my expectations wrong? Or is this an issue somewhere deeper in numpy? I looked at the memmap.py and it seems to me that most of the work is delegated to numpy.ndarray.__new__. Something wrong there maybe? Can somebody help please? Thanks! Regards, Wim Bakker From stefan at sun.ac.za Mon Dec 1 07:14:47 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 1 Dec 2008 14:14:47 +0200 Subject: [Numpy-discussion] optimising single value functions for array calculations In-Reply-To: <710F2847B0018641891D9A216027636029C359@ex3.envision.co.il> References: <12998.195.68.31.231.1228127326.squirrel@www.normalesup.org> <710F2847B0018641891D9A216027636029C359@ex3.envision.co.il> Message-ID: <9457e7c80812010414u2fb5f93as3e5536eb1a53fa7d@mail.gmail.com> 2008/12/1 Nadav Horesh : > I does not solve the slowness problem. I think I read on the list about an > experimental code for fast vectorization. The choices are basically weave, fast_vectorize (http://projects.scipy.org/scipy/scipy/ticket/727), ctypes, cython or f2py. Any I left out? Ilan's fast_vectorize should have been included in SciPy a while ago already. Volunteers for patch review? Cheers St?fan From timmichelsen at gmx-topmail.de Mon Dec 1 08:38:23 2008 From: timmichelsen at gmx-topmail.de (Timmie) Date: Mon, 1 Dec 2008 13:38:23 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?optimising_single_value_functions_fo?= =?utf-8?q?r_array=09calculations?= References: <12998.195.68.31.231.1228127326.squirrel@www.normalesup.org> <710F2847B0018641891D9A216027636029C359@ex3.envision.co.il> Message-ID: Hi, > thanks for all your answers. I will certainly test it. > numpy.vectorize(myfunc) should do what you want. Just to add a better example based on a recent discussion here on this list [1]: myfunc(x): res = math.sin(x) return res a = numpy.arange(1,20) => myfunc(a) will not work. => myfunc need to have a possibility to pass single values to math.sin either through interation (see my inital email) or through other options. (I know that numpy has a array aware sinus but wanted to use it as an example here.) My orriginal problem evolves from here timeseries computing [2]. Well, I will test and report back further. Thanks again and until soon, Timmie [1]: http://thread.gmane.org/gmane.comp.python.numeric.general/26417/focus=26418 [2]: http://thread.gmane.org/gmane.comp.python.scientific.user/18253 From oliphant at enthought.com Mon Dec 1 09:30:02 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Mon, 01 Dec 2008 08:30:02 -0600 Subject: [Numpy-discussion] memmap & dtype issue In-Reply-To: References: Message-ID: <4933F4EA.9040709@enthought.com> Wim Bakker wrote: > For a long time now, numpy's memmap has me puzzled by its behavior. When I use > memmap straightforward on a file it seems to work fine, but whenever I try to > do a memmap using a dtype it seems to gobble up the whole file into memory. > I don't understand your question. From my experience, the memmap is working fine. Please post and example that illustrates your point. > This, of course, makes the use of memmap futile. I would expect that the > result of such an operation would give me a true memmap and that the data > would be converted to dtype on the fly. > There is no conversion on the fly when you use memmap. You construct an array of the same data-type as is in the file and then manipulate portions of it as needed. > Am I doing something wrong? Are my expectations wrong? My guess is that your expectations are not accurate, but example code would help sort it out. Best regards, -Travis From dsdale24 at gmail.com Mon Dec 1 10:30:40 2008 From: dsdale24 at gmail.com (Darren Dale) Date: Mon, 1 Dec 2008 10:30:40 -0500 Subject: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing In-Reply-To: <20081201081220.GC18450@phare.normalesup.org> References: <492D8FD3.8050601@enthought.com> <492DC9B0.1030300@gmail.com> <5b8d13220811301944k7807d3a2w4fcc821255269053@mail.gmail.com> <20081201081220.GC18450@phare.normalesup.org> Message-ID: On Mon, Dec 1, 2008 at 3:12 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote: > > On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale wrote: > > > I tried installing 4.0.300x on a machine running 64-bit windows vista > home > > > edition and ran into problems with PyQt and some related packages. So I > > > uninstalled all the python-related software, EPD took over 30 minutes > to > > > uninstall, and tried to install EPD 4.1 beta. > > > My guess is that EPD is only 32 bits installer, so that you run it on > > WOW (Windows in Windows) on windows 64, which is kind of slow (but > > usable for most tasks). > > On top of that, Vista is not supported with EPD. I had a chat with the > EPD guys about that, and they say it does work with Vista... most of the > time. They don't really understand the failures, and haven't had time to > investigate much, because so far professionals and labs are simply > avoiding Vista. Hopefully someone from the EPD team will give a more > accurate answer > soon. Thanks Gael and David. I would avoid windows altogether if I could. When I bought a new laptop I had the option to pay extra to downgrade to XP pro, I should have done some more research before I settled for Vista. In the meantime I'll borrow an XP machine when I need to build python package installers for windows. Hopefully a solution can be found at some point for python and Vista. Losing compatibility on such a major platform will become increasingly problematic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Mon Dec 1 12:49:01 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 12:49:01 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... Message-ID: All, Please find attached to this message another implementation of np.loadtxt, which focuses on missing values. It's basically a combination of John Hunter's et al mlab.csv2rec, Ryan May's patches and pieces of code I'd been working on over the last few weeks. Besides some helper classes (StringConverter to convert a string into something else, NameValidator to check names..._), you'll find 3 functions: * `genloadtxt` is the base function that makes all the work. It outputs 2 arrays, one for the data (missing values being substituted by the appropriate default) and one for the mask. It would go in np.lib.io * `loadtxt` would replace the current np.loadtxt. It outputs a ndarray, where missing data being filled. It would also go in np.lib.io * `mloadtxt` would go into np.ma.io (to be created) and renamed `loadtxt`. Right now, I needed a different name to avoid conflicts. It combines the outputs of `genloadtxt` into a single masked array. You'll also several series of tests, that you can use as examples. Please give it a try and send me some feedback (bugs, wishes, suggestions). I'd like it to make the 1.3.0 release (I need some of the functionalities to improve the corresponding function in scikits.timeseries, currently fubar...) P. From pgmdevlist at gmail.com Mon Dec 1 13:21:32 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 13:21:32 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References:

Message-ID: <7267494B-A9AB-4649-B13D-BB00508954C9@gmail.com> And now for the tests: -------------- next part -------------- A non-text attachment was scrubbed... Name: genload_proposal_tests.py Type: text/x-python-script Size: 15192 bytes Desc: not available URL: From stefan at sun.ac.za Mon Dec 1 13:22:15 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 1 Dec 2008 20:22:15 +0200 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: Message-ID: <9457e7c80812011022t26bae211lfd317a1d314b7e3e@mail.gmail.com> 2008/12/1 Pierre GM : > Please find attached to this message another implementation of Struggling to comply! Cheers St?fan From pgmdevlist at gmail.com Mon Dec 1 13:21:08 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 13:21:08 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References:

Message-ID: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Well, looks like the attachment is too big, so here's the implementation. The tests will come in another message. -------------- next part -------------- A non-text attachment was scrubbed... Name: genload_proposal.py Type: text/x-python-script Size: 27313 bytes Desc: not available URL: -------------- next part -------------- From jdh2358 at gmail.com Mon Dec 1 13:54:27 2008 From: jdh2358 at gmail.com (John Hunter) Date: Mon, 1 Dec 2008 12:54:27 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Message-ID: <88e473830812011054o5b9c184aib1f41fec0faff6b7@mail.gmail.com> On Mon, Dec 1, 2008 at 12:21 PM, Pierre GM wrote: > Well, looks like the attachment is too big, so here's the implementation. > The tests will come in another message.\ It looks like I am doing something wrong -- trying to parse a CSV file with dates formatted like '2008-10-14', with:: import datetime, sys import dateutil.parser StringConverter.upgrade_mapper(dateutil.parser.parse, default=datetime.date(1900,1,1)) r = loadtxt(sys.argv[1], delimiter=',', names=True) print r.dtype I get the following:: Traceback (most recent call last): File "genload_proposal.py", line 734, in ? r = loadtxt(sys.argv[1], delimiter=',', names=True) File "genload_proposal.py", line 711, in loadtxt (output, _) = genloadtxt(fname, **kwargs) File "genload_proposal.py", line 646, in genloadtxt rows[i] = tuple([conv(val) for (conv, val) in zip(converters, vals)]) File "genload_proposal.py", line 385, in __call__ raise ValueError("Cannot convert string '%s'" % value) ValueError: Cannot convert string '2008-10-14' In debug mode, I see the following where the error occurs ipdb> vals ('2008-10-14', '116.26', '116.40', '103.14', '104.08', '70749800', '104.08') ipdb> converters [<__main__.StringConverter instance at 0xa35fa6c>, <__main__.StringConverter instance at 0xa35ff2c>, <__main__.StringConverter instance at 0xa35ff8c>, <__main__.StringConverter instance at 0xa35ffec>, <__main__.StringConverter instance at 0xa15406c>, <__main__.StringConverter instance at 0xa1540cc>, <__main__.StringConverter instance at 0xa15412c>] It looks like my registry of a custom converter isn't working. Here is what the _mapper looks like:: In [23]: StringConverter._mapper Out[23]: [(, , None), (, , -1), (, , -NaN), (, , (-NaN+0j)), (, , datetime.date(1900, 1, 1)), (, , '???')] From pgmdevlist at gmail.com Mon Dec 1 14:14:19 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 14:14:19 -0500 Subject: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation... References: <7B106C20-EB72-4126-956C-02866D204A3E@gmail.com> Message-ID: <8A781DA2-C341-41D1-9E0F-81B922EB5B87@gmail.com> (Sorry about that, I pressed "Reply" instead of "Reply all". Not my day for emails...) > On Dec 1, 2008, at 1:54 PM, John Hunter wrote: >> >> It looks like I am doing something wrong -- trying to parse a CSV >> file >> with dates formatted like '2008-10-14', with:: >> >> import datetime, sys >> import dateutil.parser >> StringConverter.upgrade_mapper(dateutil.parser.parse, >> default=datetime.date(1900,1,1)) >> r = loadtxt(sys.argv[1], delimiter=',', names=True) > > John, > The problem you have is that the default dtype is 'float' (for > backwards compatibility w/ the original np.loadtxt). What you want > is to automatically change the dtype according to the content of > your file: you should use dtype=None > > r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None) > > As you'll want a recarray, we could make a np.records.loadtxt > function where dtype=None would be the default... From jdh2358 at gmail.com Mon Dec 1 14:26:40 2008 From: jdh2358 at gmail.com (John Hunter) Date: Mon, 1 Dec 2008 13:26:40 -0600 Subject: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation... In-Reply-To: <8A781DA2-C341-41D1-9E0F-81B922EB5B87@gmail.com> References: <7B106C20-EB72-4126-956C-02866D204A3E@gmail.com> <8A781DA2-C341-41D1-9E0F-81B922EB5B87@gmail.com> Message-ID: <88e473830812011126i2ec9ad37t80b9c9712c49f19e@mail.gmail.com> On Mon, Dec 1, 2008 at 1:14 PM, Pierre GM wrote: >> The problem you have is that the default dtype is 'float' (for >> backwards compatibility w/ the original np.loadtxt). What you want >> is to automatically change the dtype according to the content of >> your file: you should use dtype=None >> >> r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None) >> >> As you'll want a recarray, we could make a np.records.loadtxt >> function where dtype=None would be the default... > As you'll want a recarray, we could make a np.records.loadtxt function where > dtype=None would be the default... OK, that worked great. I do think some a default impl in np.rec which returned a recarray would be nice. It might also be nice to have a method like np.rec.fromcsv which defaults to a delimiter=',', names=True and dtype=None. Since csv is one of the most common data interchange format in the world, it would be nice to have some obvious function that works with it with little or no customization required. Fernando and I have taught a scientific computing course on a number of occasions, and on the last round we taught to undergrads. Most of these students have little or no programming, for many the concept of an array is something they struggle with, dtypes are a difficult concept, but we found that they responded very well to our csv2rec example, because with no syntactic cruft they were able to load a file and do some stats on the columns, and I would like to see that ease of use preserved. JDH From pgmdevlist at gmail.com Mon Dec 1 14:42:27 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 14:42:27 -0500 Subject: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation... In-Reply-To: <88e473830812011126i2ec9ad37t80b9c9712c49f19e@mail.gmail.com> References: <7B106C20-EB72-4126-956C-02866D204A3E@gmail.com> <8A781DA2-C341-41D1-9E0F-81B922EB5B87@gmail.com> <88e473830812011126i2ec9ad37t80b9c9712c49f19e@mail.gmail.com> Message-ID: <70C80B0B-96F2-4667-BFF9-7D7FCB3958D6@gmail.com> On Dec 1, 2008, at 2:26 PM, John Hunter wrote > > OK, that worked great. I do think some a default impl in np.rec which > returned a recarray would be nice. It might also be nice to have a > method like np.rec.fromcsv which defaults to a delimiter=',', > names=True and dtype=None. Since csv is one of the most common data > interchange format in the world, it would be nice to have some > obvious function that works with it with little or no customization > required. Quite agreed. Personally, I'd ditch the default dtype=float in favor of dtype=None, but compatibility is an issue. However, if we all agree on genloadtxt, we can use tailored-made version in different modules, like you suggest. There's an extra issue for which we have an solution I'm not completely satisfied with: names=True. It might be simpler for basic user not to set names=True, and have the first header recognized as names or not if needed (by processing the first line after the others, and using it as header if it's found to be a list of names, or inserting it back at the beginning otherwise)... From ndbecker2 at gmail.com Mon Dec 1 14:43:11 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 01 Dec 2008 14:43:11 -0500 Subject: [Numpy-discussion] fromiter typo? Message-ID: Says it takes a default dtype arg, but doesn't act like it's an optional arg: fromiter (iterator or generator, dtype=None) Construct an array from an iterator or a generator. Only handles 1-dimensional cases. By default the data-type is determined from the objects returned from the iterator. ---> 20 z = fromiter (y) TypeError: function takes at least 2 arguments (1 given) From pav at iki.fi Mon Dec 1 14:56:51 2008 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 1 Dec 2008 19:56:51 +0000 (UTC) Subject: [Numpy-discussion] fromiter typo? References: Message-ID: Mon, 01 Dec 2008 14:43:11 -0500, Neal Becker wrote: > Says it takes a default dtype arg, but doesn't act like it's an optional > arg: > > fromiter (iterator or generator, dtype=None) Construct an array from an > iterator or a generator. Only handles 1-dimensional cases. By default > the data-type is determined from the objects returned from the iterator. > > ---> 20 z = fromiter (y) > > TypeError: function takes at least 2 arguments (1 given) The docstring is correct in 1.2.1 and in the documentation; I suppose you have an older version of Numpy. -- Pauli Virtanen From stefan at sun.ac.za Mon Dec 1 15:47:06 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 1 Dec 2008 22:47:06 +0200 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: Message-ID: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> Hi Pierre 2008/12/1 Pierre GM : > * `genloadtxt` is the base function that makes all the work. It > outputs 2 arrays, one for the data (missing values being substituted > by the appropriate default) and one for the mask. It would go in > np.lib.io I see the code length increased from 200 lines to 800. This made me wonder about the execution time: initial benchmarks suggest a 3x slow-down. Could this be a problem for loading large text files? If so, should we consider keeping both versions around, or by default bypassing all the extra hooks? Regards St?fan From rmay31 at gmail.com Mon Dec 1 16:23:18 2008 From: rmay31 at gmail.com (Ryan May) Date: Mon, 01 Dec 2008 15:23:18 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> Message-ID: <493455C6.7010701@gmail.com> St?fan van der Walt wrote: > Hi Pierre > > 2008/12/1 Pierre GM : >> * `genloadtxt` is the base function that makes all the work. It >> outputs 2 arrays, one for the data (missing values being substituted >> by the appropriate default) and one for the mask. It would go in >> np.lib.io > > I see the code length increased from 200 lines to 800. This made me > wonder about the execution time: initial benchmarks suggest a 3x > slow-down. Could this be a problem for loading large text files? If > so, should we consider keeping both versions around, or by default > bypassing all the extra hooks? I've wondered about this being an issue. On one hand, you hate to make existing code noticeably slower. On the other hand, if speed is important to you, why are you using ascii I/O? I personally am not entirely against having two versions of loadtxt-like functions. However, the idea seems a little odd, seeing as how loadtxt was already supposed to be the "swiss army knife" of text reading. I'm seeing a similar slowdown with Pierre's version of the code. The version of loadtxt that I cobbled together with the StringConverter class (and no missing value support) shows about a 50% slowdown, so clearly there's a performance penalty for trying to make a generic function that can be all things to all people. On the other hand, this approach reduces code duplication. I'm not really opinionated on what the right approach is here. My only opinion is that this functionality *really* needs to be in numpy in some fashion. For my own use case, with the old version, I could read a text file and by hand separate out columns and mask values. Now, I open a file and get a structured array with an automatically detected dtype (names and types!) plus masked values. My $0.02. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From stefan at sun.ac.za Mon Dec 1 16:47:00 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 1 Dec 2008 23:47:00 +0200 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <493455C6.7010701@gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> Message-ID: <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> 2008/12/1 Ryan May : > I've wondered about this being an issue. On one hand, you hate to make > existing code noticeably slower. On the other hand, if speed is > important to you, why are you using ascii I/O? More "I" than "O"! But I think numpy.fromfile, once fixed up, could fill this niche nicely. > I personally am not entirely against having two versions of loadtxt-like > functions. However, the idea seems a little odd, seeing as how loadtxt > was already supposed to be the "swiss army knife" of text reading. I haven't investigated the code in too much detail, but wouldn't it be possible to implement the current set of functionality in a base-class, which is then specialised to add the rest? That way, one could always instantiate TextReader yourself for some added speed. > I'm not really opinionated on what the right approach is here. My only > opinion is that this functionality *really* needs to be in numpy in some > fashion. For my own use case, with the old version, I could read a text > file and by hand separate out columns and mask values. Now, I open a > file and get a structured array with an automatically detected dtype > (names and types!) plus masked values. That's neat! Cheers St?fan From pgmdevlist at gmail.com Mon Dec 1 17:55:43 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 17:55:43 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> Message-ID: I agree, genloadtxt is a bit blotted, and it's not a surprise it's slower than the initial one. I think that in order to be fair, comparisons must be performed with matplotlib.mlab.csv2rec, that implements as well the autodetection of the dtype. I'm quite in favor of keeping a lite version around. On Dec 1, 2008, at 4:47 PM, St?fan van der Walt wrote: >> > I haven't investigated the code in too much detail, but wouldn't it be > possible to implement the current set of functionality in a > base-class, which is then specialised to add the rest? That way, one > could always instantiate TextReader yourself for some added speed. Well, one of the issues is that we need to keep the function compatible w/ urllib.urlretrieve (Ryan, am I right?), which means not being able to go back to the beginning of a file (no call to .seek). Another issue comes from the possibility to define the dtype automatically: you need to keep track of the converters, then have to do a second loop on the data. Those converters are likely the bottleneck, as you need to check whether each value can be interpreted as missing or not and respond appropriately. I thought about creating a base class, with a specific subclass taking care of the missing values. I found out it would have duplicated a lot of code In any case, I think that's secondary: we can always optimize pieces of the code afterwards. I'd like more feedback on corner cases and usage... From efiring at hawaii.edu Mon Dec 1 18:09:59 2008 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 01 Dec 2008 13:09:59 -1000 Subject: [Numpy-discussion] bug in ma.masked_all()? Message-ID: <49346EC7.6020109@hawaii.edu> Pierre, ma.masked_all does not seem to work with fancy dtypes and more then one dimension: In [1]:import numpy as np In [2]:dt = np.dtype({'names': ['a', 'b'], 'formats': ['f', 'f']}) In [3]:x = np.ma.masked_all((2,), dtype=dt) In [4]:x Out[4]: masked_array(data = [(--, --) (--, --)], mask = [(True, True) (True, True)], fill_value=(1.0000000200408773e+20, 1.0000000200408773e+20)) In [5]:x = np.ma.masked_all((2,2), dtype=dt) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/efiring/ in () /usr/local/lib/python2.5/site-packages/numpy/ma/extras.pyc in masked_all(shape, dtype) 78 """ 79 a = masked_array(np.empty(shape, dtype), ---> 80 mask=np.ones(shape, bool)) 81 return a 82 /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __new__(cls, data, mask, dtype, copy, subok, ndmin, fill_value, keep_mask, hard_mask, flag, shrink, **options) 1304 except TypeError: 1305 mask = np.array([tuple([m]*len(mdtype)) for m in mask], -> 1306 dtype=mdtype) 1307 # Make sure the mask and the data have the same shape 1308 if mask.shape != _data.shape: TypeError: expected a readable buffer object ----------------- Eric From Chris.Barker at noaa.gov Mon Dec 1 18:19:24 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 01 Dec 2008 15:19:24 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> Message-ID: <493470FC.6020004@noaa.gov> St?fan van der Walt wrote: >> important to you, why are you using ascii I/O? ascii I/O is slow, so that's a reason in itself to want it not to be slower! > More "I" than "O"! But I think numpy.fromfile, once fixed up, could > fill this niche nicely. I agree -- for the simple cases, fromfile() could work very well -- perhaps it could even be used to speed up some special cases of loadtxt. But is anyone working on fromfile()? By the way, I think overloading fromfile() for text files is a bit misleading for users -- I propose we have a fromtextfile() or something instead. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Mon Dec 1 18:21:07 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 01 Dec 2008 15:21:07 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> Message-ID: <49347163.8070207@noaa.gov> Pierre GM wrote: > Another issue comes from the possibility to define the dtype > automatically: Does all that get bypassed if the dtype(s) is specified? Is it still slow in that case? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Mon Dec 1 18:28:38 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 18:28:38 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <49347163.8070207@noaa.gov> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> Message-ID: <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> On Dec 1, 2008, at 6:21 PM, Christopher Barker wrote: > Pierre GM wrote: >> Another issue comes from the possibility to define the dtype >> automatically: > > Does all that get bypassed if the dtype(s) is specified? Is it still > slow in that case? Good question. Having a dtype != None does skip a secondary loop. Once again, I;m sure there's plenty of room for optimization (eg, different loops whether the dtype is defined or not, whether missing values have to be taken into account or not, etc...). I just want to make sure that we're not missing any functionality and/or corner cases and that the usage is intuitive enough before spending some time optimizing... From f.yw at hotmail.com Mon Dec 1 20:38:11 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 1 Dec 2008 18:38:11 -0700 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter In-Reply-To: <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> Message-ID: Hi, I need to convolve a 1d filter with 8 coefficients with a 2d array of the shape (6,7). I can use convolve to perform the operation for each row. This will involve a for loop with a counter 6. I wonder there is an fast way to do this in numpy without using for loop. Does anyone know how to do it? Thanks Frank _________________________________________________________________ Access your email online and on the go with Windows Live Hotmail. http://windowslive.com/Explore/Hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_access_112008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From h5py at alfven.org Mon Dec 1 20:53:46 2008 From: h5py at alfven.org (Andrew Collette) Date: Mon, 01 Dec 2008 17:53:46 -0800 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 Message-ID: <1228182826.24243.1.camel@tachyon-laptop> ===================================== Announcing HDF5 for Python (h5py) 1.0 ===================================== What is h5py? ------------- HDF5 for Python (h5py) is a general-purpose Python interface to the Hierarchical Data Format library, version 5. HDF5 is a versatile, mature scientific software library designed for the fast, flexible storage of enormous amounts of data. >From a Python programmer's perspective, HDF5 provides a robust way to store data, organized by name in a tree-like fashion. You can create datasets (arrays on disk) hundreds of gigabytes in size, and perform random-access I/O on desired sections. Datasets are organized in a filesystem-like hierarchy using containers called "groups", and accesed using the tradional POSIX /path/to/resource syntax. This is the fourth major release of h5py, and represents the end of the "unstable" (0.X.X) design phase. Why should I use it? -------------------- H5py provides a simple, robust read/write interface to HDF5 data from Python. Existing Python and NumPy concepts are used for the interface; for example, datasets on disk are represented by a proxy class that supports slicing, and has dtype and shape attributes. HDF5 groups are are presented using a dictionary metaphor, indexed by name. A major design goal of h5py is interoperability; you can read your existing data in HDF5 format, and create new files that any HDF5- aware program can understand. No Python-specific extensions are used; you're free to implement whatever file structure your application desires. Almost all HDF5 features are available from Python, including things like compound datatypes (as used with NumPy recarray types), HDF5 attributes, hyperslab and point-based I/O, and more recent features in HDF 1.8 like resizable datasets and recursive iteration over entire files. The foundation of h5py is a near-complete wrapping of the HDF5 C API. HDF5 identifiers are first-class objects which participate in Python reference counting, and expose the C API via methods. This low-level interface is also made available to Python programmers, and is exhaustively documented. See the Quick-Start Guide for a longer introduction with code examples: http://h5py.alfven.org/docs/guide/quick.html Where to get it --------------- * Main website, documentation: http://h5py.alfven.org * Downloads, bug tracker: http://h5py.googlecode.com * The HDF group website also contains a good introduction: http://www.hdfgroup.org/HDF5/doc/H5.intro.html Requires -------- * UNIX-like platform (Linux or Mac OS-X); Windows version is in progress. * Python 2.5 or 2.6 * NumPy 1.0.3 or later (1.1.0 or later recommended) * HDF5 1.6.5 or later, including 1.8. Some features only available when compiled against HDF5 1.8. * Optionally, Cython (see cython.org) if you want to use custom install options. You'll need version 0.9.8.1.1 or later. About this version ------------------ Version 1.0 follows version 0.3.1 as the latest public release. The major design phase (which began in May of 2008) is now over; the design of the high-level API will be supported as-is for the rest of the 1.X series, with minor enhancements. This is the first version to support Python 2.6, and the first to use Cython for the low-level interface. The license remains 3-clause BSD. ** This project is NOT affiliated with The HDF Group. ** Thanks ------ Thanks to D. Dale, E. Lawrence and other for their continued support and comments. Also thanks to the PyTables project, for inspiration and generously providing their code to the community, and to everyone at the HDF Group for creating such a useful piece of software. From pgmdevlist at gmail.com Mon Dec 1 21:42:53 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 21:42:53 -0500 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <49346EC7.6020109@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> Message-ID: <2B254DF1-C3BA-4580-9D0C-9D65660D288E@gmail.com> On Dec 1, 2008, at 6:09 PM, Eric Firing wrote: > Pierre, > > ma.masked_all does not seem to work with fancy dtypes and more then > one dimension: Eric, Should be fixed in SVN (r6130). There were indeed problems with nested dtypes. Tricky beasts they are. Thanks for reporting! From josef.pktd at gmail.com Mon Dec 1 21:53:01 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 1 Dec 2008 21:53:01 -0500 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 In-Reply-To: <1228182826.24243.1.camel@tachyon-laptop> References: <1228182826.24243.1.camel@tachyon-laptop> Message-ID: <1cd32cbb0812011853r89e582dl2d65ac953c3983dc@mail.gmail.com> >Requires >-------- > >* UNIX-like platform (Linux or Mac OS-X); >Windows version is in progress I installed version 0.3.0 back in August on WindowsXP, and as far as I remember there were no problems at all with the install, and all tests pass. I thought the interface was really easy to use. But after trying it out I realized that my matlab is too old to understand the generated hdf5 files in an easy-to-use way, and I had to go back to csv-files. Josef From stefan at sun.ac.za Tue Dec 2 00:42:27 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 2 Dec 2008 07:42:27 +0200 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter In-Reply-To: References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> Message-ID: <9457e7c80812012142o39db6141h8b6a18d5b4de52af@mail.gmail.com> Hi Frank 2008/12/2 frank wang : > I need to convolve a 1d filter with 8 coefficients with a 2d array of the > shape (6,7). I can use convolve to perform the operation for each row. This > will involve a for loop with a counter 6. I wonder there is > an fast way to do this in numpy without using for loop. Does anyone know how > to do it? Since 6x7 is quite small, so you can afford this trick: a) Pad the 6,7 array to 6,14. b) Flatten the array c) Perform convolution d) Unflatten array e) Take out valid values Cheers St?fan From f.yw at hotmail.com Tue Dec 2 01:14:09 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 1 Dec 2008 23:14:09 -0700 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter In-Reply-To: <9457e7c80812012142o39db6141h8b6a18d5b4de52af@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> <9457e7c80812012142o39db6141h8b6a18d5b4de52af@mail.gmail.com> Message-ID: This is what I thought to do. However, I am not sure whether this is a fast way to do it and also I want to find a more generous way to do it. I thought there may be a more elegant way to do it. Thanks Frank> Date: Tue, 2 Dec 2008 07:42:27 +0200> From: stefan at sun.ac.za> To: numpy-discussion at scipy.org> Subject: Re: [Numpy-discussion] fast way to convolve a 2d array with 1d filter> > Hi Frank> > > 2008/12/2 frank wang :> > I need to convolve a 1d filter with 8 coefficients with a 2d array of the> > shape (6,7). I can use convolve to perform the operation for each row. This> > will involve a for loop with a counter 6. I wonder there is> > an fast way to do this in numpy without using for loop. Does anyone know how> > to do it?> > Since 6x7 is quite small, so you can afford this trick:> > a) Pad the 6,7 array to 6,14.> b) Flatten the array> c) Perform convolution> d) Unflatten array> e) Take out valid values> > Cheers> St?fan> _______________________________________________> Numpy-discussion mailing list> Numpy-discussion at scipy.org> http://projects.scipy.org/mailman/listinfo/numpy-discussion _________________________________________________________________ Get more done, have more fun, and stay more connected with Windows Mobile?. http://clk.atdmt.com/MRT/go/119642556/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Tue Dec 2 01:59:01 2008 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 01 Dec 2008 20:59:01 -1000 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <49346EC7.6020109@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> Message-ID: <4934DCB5.9000602@hawaii.edu> Pierre, Your change fixed masked_all for the example I gave, but I think it introduced a new failure in zeros: dt = np.dtype([((' Pressure, Digiquartz [db]', 'P'), ' in () /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __call__(self, a, *args, **params) 4533 # 4534 def __call__(self, a, *args, **params): -> 4535 return self._func.__call__(a, *args, **params).view(MaskedArray) 4536 4537 arange = _convert2ma('arange') /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __array_finalize__(self, obj) 1548 odtype = obj.dtype 1549 if odtype.names: -> 1550 _mask = getattr(obj, '_mask', make_mask_none(obj.shape, odtype)) 1551 else: 1552 _mask = getattr(obj, '_mask', nomask) /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in make_mask_none(newshape, dtype) 921 result = np.zeros(newshape, dtype=MaskType) 922 else: --> 923 result = np.zeros(newshape, dtype=make_mask_descr(dtype)) 924 return result 925 /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in make_mask_descr(ndtype) 819 if not isinstance(ndtype, np.dtype): 820 ndtype = np.dtype(ndtype) --> 821 return np.dtype(_make_descr(ndtype)) 822 823 def get_mask(a): /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in _make_descr(datatype) 806 descr = [] 807 for name in names: --> 808 (ndtype, _) = datatype.fields[name] 809 descr.append((name, _make_descr(ndtype))) 810 return descr ValueError: too many values to unpack From charlesr.harris at gmail.com Tue Dec 2 02:05:04 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 2 Dec 2008 00:05:04 -0700 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter In-Reply-To: References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> <9457e7c80812012142o39db6141h8b6a18d5b4de52af@mail.gmail.com> Message-ID: On Mon, Dec 1, 2008 at 11:14 PM, frank wang wrote: > This is what I thought to do. However, I am not sure whether this is a > fast way to do it and also I want to find a more generous way to do it. I > thought there may be a more elegant way to do it. > > Thanks > > Frank > Well, for just the one matrix not much will speed it up. If you have lots of matrices and the coefficients are fixed, then you can set up a "convolution" matrix whose columns are the coefficients shifted appropriately. Then just do a matrix multiply. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Tue Dec 2 03:16:02 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 03:16:02 -0500 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <4934DCB5.9000602@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> Message-ID: Eric, That's quite a handful you have with this dtype... So yes, the fix I gave works with nested dtypes and flexible dtypes with a simple name (string, not tuple). I'm a bit surprised with numpy, here. Consider: >>> dt.names ('P', 'D', 'T', 'w', 'S', 'sigtheta', 'theta') So we lose the tuple and get a single string instead, corresponding to the right-hand element of the name.. But this single string is one of the keys of dt.fields, whereas the tuple is not. Puzzling. I'm sure there must be some reference in the numpy book, but I can't look for it now. Anyway: Prior to version 6127, make_mask_descr was substituting the 2nd element of each tuple of a dtype.descr by a bool. Which failed for nested dtypes. Now, we check the field corresponding to a name, which fails in our particular case. I'll be working on it... On Dec 2, 2008, at 1:59 AM, Eric Firing wrote: > dt = np.dtype([((' Pressure, Digiquartz [db]', 'P'), ' Depth [salt water, m]', 'D'), ' C]', 'T'), ' Salinity [PSU]', 'S'), ' 'sigtheta'), ' 'theta'), ' > np.ma.zeros((2,2), dt) From efiring at hawaii.edu Tue Dec 2 04:26:35 2008 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 01 Dec 2008 23:26:35 -1000 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> Message-ID: <4934FF4B.2040801@hawaii.edu> Pierre GM wrote: > Eric, > That's quite a handful you have with this dtype... Here is a simplified example of how I made it: dt = np.dtype({'names': ['a','b'], 'formats': ['f', 'f'], 'titles': ['aaa', 'bbb']}) From page 132 in the numpy book: The fields dictionary is indexed by keys that are the names of the fields. Each entry in the dictionary is a tuple fully describing the field: (dtype, offset[,title]). If present, the optional title can actually be any object (if it is string or unicode then it will also be a key in the fields dictionary, otherwise it?s meta-data). -------- I put the titles in as a sort of additional documentation, and thinking that they might be useful for labeling plots; but it is rather hard to get the titles back out, since they are not directly accessible as an attribute, like names. Probably I should just omit them. Eric > So yes, the fix I gave works with nested dtypes and flexible dtypes > with a simple name (string, not tuple). I'm a bit surprised with > numpy, here. > Consider: > > >>> dt.names > ('P', 'D', 'T', 'w', 'S', 'sigtheta', 'theta') > > So we lose the tuple and get a single string instead, corresponding to > the right-hand element of the name.. > But this single string is one of the keys of dt.fields, whereas the > tuple is not. Puzzling. I'm sure there must be some reference in the > numpy book, but I can't look for it now. > > Anyway: > Prior to version 6127, make_mask_descr was substituting the 2nd > element of each tuple of a dtype.descr by a bool. Which failed for > nested dtypes. Now, we check the field corresponding to a name, which > fails in our particular case. > > > I'll be working on it... > > > > On Dec 2, 2008, at 1:59 AM, Eric Firing wrote: > >> dt = np.dtype([((' Pressure, Digiquartz [db]', 'P'), '> Depth [salt water, m]', 'D'), '> C]', 'T'), '> Salinity [PSU]', 'S'), '> 'sigtheta'), '> 'theta'), '> >> np.ma.zeros((2,2), dt) > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From pgmdevlist at gmail.com Tue Dec 2 04:42:21 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 04:42:21 -0500 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <4934FF4B.2040801@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> <4934FF4B.2040801@hawaii.edu> Message-ID: <517BA3D7-C785-4EEC-81BE-3C356B8C1145@gmail.com> On Dec 2, 2008, at 4:26 AM, Eric Firing wrote: > From page 132 in the numpy book: > > The fields dictionary is indexed by keys that are the names of the > fields. Each entry in the dictionary is a tuple fully describing the > field: (dtype, offset[,title]). If present, the optional title can > actually be any object (if it is string or unicode then it will also > be > a key in the fields dictionary, otherwise it?s meta-data). I should read it more often... > > I put the titles in as a sort of additional documentation, and > thinking > that they might be useful for labeling plots; That's actually quite a good idea... > but it is rather hard to > get the titles back out, since they are not directly accessible as an > attribute, like names. Probably I should just omit them. We could perhaps try a function: def gettitle(dtype, name): try: field = dtype.fields[name] except (TypeError, KeyError): return None else: if len(field) > 2: return field[-1] return None From Joris.DeRidder at ster.kuleuven.be Tue Dec 2 07:21:49 2008 From: Joris.DeRidder at ster.kuleuven.be (Joris De Ridder) Date: Tue, 2 Dec 2008 13:21:49 +0100 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> Message-ID: <81E7FF4F-5D57-456F-9328-C89B4B4F78EE@ster.kuleuven.be> On 1 Dec 2008, at 21:47 , St?fan van der Walt wrote: > Hi Pierre > > 2008/12/1 Pierre GM : >> * `genloadtxt` is the base function that makes all the work. It >> outputs 2 arrays, one for the data (missing values being substituted >> by the appropriate default) and one for the mask. It would go in >> np.lib.io > > I see the code length increased from 200 lines to 800. This made me > wonder about the execution time: initial benchmarks suggest a 3x > slow-down. Could this be a problem for loading large text files? If > so, should we consider keeping both versions around, or by default > bypassing all the extra hooks? > > Regards > St?fan As a historical note, we used to have scipy.io.read_array which at the time was considered by Travis too slow and too "grandiose" to be put in Numpy. As a consequence, numpy.loadtxt() was created which was simple and fast. Now it looks like we're going back to something grandiose. But perhaps it can be made grandiose *and* reasonably fast ;-). Cheers, Joris P.S. As a reference: http://article.gmane.org/gmane.comp.python.numeric.general/5556/ Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From nadavh at visionsense.com Tue Dec 2 07:36:55 2008 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 2 Dec 2008 14:36:55 +0200 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com><493455C6.7010701@gmail.com><9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com><49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> Message-ID: <710F2847B0018641891D9A216027636029C35F@ex3.envision.co.il> You can use 2D convolution routines either in scipy.signal or numpy.numarray.nd_image Nadav -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? frank wang ????: ? 02-?????-08 03:38 ??: numpy-discussion at scipy.org ????: [Numpy-discussion] fast way to convolve a 2d array with 1d filter Hi, I need to convolve a 1d filter with 8 coefficients with a 2d array of the shape (6,7). I can use convolve to perform the operation for each row. This will involve a for loop with a counter 6. I wonder there is an fast way to do this in numpy without using for loop. Does anyone know how to do it? Thanks Frank _________________________________________________________________ Access your email online and on the go with Windows Live Hotmail. http://windowslive.com/Explore/Hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_access_112008 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3494 bytes Desc: not available URL: From aisaac at american.edu Tue Dec 2 08:12:25 2008 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 02 Dec 2008 08:12:25 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <81E7FF4F-5D57-456F-9328-C89B4B4F78EE@ster.kuleuven.be> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <81E7FF4F-5D57-456F-9328-C89B4B4F78EE@ster.kuleuven.be> Message-ID: <49353439.8030804@american.edu> On 12/2/2008 7:21 AM Joris De Ridder apparently wrote: > As a historical note, we used to have scipy.io.read_array which at the > time was considered by Travis too slow and too "grandiose" to be put > in Numpy. As a consequence, numpy.loadtxt() was created which was > simple and fast. Now it looks like we're going back to something > grandiose. But perhaps it can be made grandiose *and* reasonably > fast ;-). I hope this consideration remains prominent in this thread. Is the disappearance or read_array the reason for this change? What happened to it? Note that read_array_demo1.py is still in scipy.io despite the loss of read_array. Alan Isaac From aisaac at american.edu Tue Dec 2 08:46:29 2008 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 02 Dec 2008 08:46:29 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <49353439.8030804@american.edu> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <81E7FF4F-5D57-456F-9328-C89B4B4F78EE@ster.kuleuven.be> <49353439.8030804@american.edu> Message-ID: <49353C35.8050909@american.edu> On 12/2/2008 8:12 AM Alan G Isaac apparently wrote: > I hope this consideration remains prominent > in this thread. Is the disappearance or > read_array the reason for this change? > What happened to it? Apologies; it is only deprecated, not gone. Alan Isaac From Christophe.Chappet at onera.fr Tue Dec 2 09:26:15 2008 From: Christophe.Chappet at onera.fr (Christophe Chappet) Date: Tue, 02 Dec 2008 15:26:15 +0100 Subject: [Numpy-discussion] [F2PY] Fortran call fails in IDLE / PyScripter Message-ID: <49354587.30906@onera.fr> Hi all, I compile the followinq code using "f2py -c --fcompiler=gnu95 --compiler=mingw32" -m hello subroutine AfficheMessage(szText) character szText*100 write (*,*) szText return end Using python console : >>>import hello >>>hello.affichemessage(" Hello") works fine ! I do the same in the program window of IDLE and : - no message is displayed. - the shell restart (or IDLE crah if launched with -n) Same problem with PyScripter IDE. (crash). Any suggestion ? Regards, Christophe -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Tue Dec 2 10:24:16 2008 From: rmay31 at gmail.com (Ryan May) Date: Tue, 2 Dec 2008 09:24:16 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> Message-ID: On Mon, Dec 1, 2008 at 4:55 PM, Pierre GM wrote: > On Dec 1, 2008, at 4:47 PM, St?fan van der Walt wrote: > >> > > I haven't investigated the code in too much detail, but wouldn't it be > > possible to implement the current set of functionality in a > > base-class, which is then specialised to add the rest? That way, one > > could always instantiate TextReader yourself for some added speed. > > Well, one of the issues is that we need to keep the function > compatible w/ urllib.urlretrieve (Ryan, am I right?), which means not > being able to go back to the beginning of a file (no call to .seek). > Well, the original version of loadtxt() checked for seek but didn't need it (fixed now), which kept me from using a urllib2.urlopen() object. If actually using seek() would speed up the new version of loadtxt(), feel free to use it. I'm more than capable of wrapping the urlopen() object within a StringIO. However, I am unconvinced that removing the 2nd loop and instead redoing the reading from the file will be much (if any) of a speed win. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Dec 2 10:58:46 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 2 Dec 2008 10:58:46 -0500 Subject: [Numpy-discussion] [F2PY] Fortran call fails in IDLE / PyScripter In-Reply-To: <49354587.30906@onera.fr> References: <49354587.30906@onera.fr> Message-ID: <1cd32cbb0812020758v5a56a7ebsf80640d07aaebb71@mail.gmail.com> On Tue, Dec 2, 2008 at 9:26 AM, Christophe Chappet wrote: > Hi all, > I compile the followinq code using "f2py -c --fcompiler=gnu95 > --compiler=mingw32" -m hello > subroutine AfficheMessage(szText) > character szText*100 > write (*,*) szText > return > end > > Using python console : >>>>import hello >>>>hello.affichemessage(" > Hello") > works fine ! > > I do the same in the program window of IDLE and : > - no message is displayed. > - the shell restart (or IDLE crah if launched with -n) > > Same problem with PyScripter IDE. (crash). > > Any suggestion ? > Regards, > Christophe > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > Is this a write to standard output "write (*,*) szText" ? Robert Kern mentioned several times that mingw is broken for writing to stdout but I only know about it for stdout in c. I always get a crash when a test compiles a write to stdout in c with mingw on my WindowsXP. But then my impression is that it shouldn't work on the command line either. Since I don't know much about f2py, I'm not sure whether fortran has the same problem as c with mingw. Josef From pgmdevlist at gmail.com Tue Dec 2 12:57:06 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 12:57:06 -0500 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <4934DCB5.9000602@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> Message-ID: <6402833F-EFCA-41F1-9794-291ABB78F0AA@gmail.com> On Dec 2, 2008, at 1:59 AM, Eric Firing wrote: > Pierre, > > Your change fixed masked_all for the example I gave, but I think it > introduced a new failure in zeros: Eric, Would you mind giving r6131 a try ? It's rather ugly but looks like it works... From efiring at hawaii.edu Tue Dec 2 14:44:36 2008 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 02 Dec 2008 09:44:36 -1000 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <6402833F-EFCA-41F1-9794-291ABB78F0AA@gmail.com> References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> <6402833F-EFCA-41F1-9794-291ABB78F0AA@gmail.com> Message-ID: <49359024.8060703@hawaii.edu> Pierre GM wrote: > > On Dec 2, 2008, at 1:59 AM, Eric Firing wrote: > >> Pierre, >> >> Your change fixed masked_all for the example I gave, but I think it >> introduced a new failure in zeros: > > Eric, > Would you mind giving r6131 a try ? It's rather ugly but looks like it > works... So far, so good. Thanks very much. Eric From zachary.pincus at yale.edu Tue Dec 2 14:47:38 2008 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 2 Dec 2008 14:47:38 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Message-ID: Hi Pierre, I've tested the new loadtxt briefly. Looks good, except that there's a minor bug when trying to use a specific white-space delimiter (e.g. \t) while still allowing other white-space to be allowed in fields (e.g. spaces). Specifically, on line 115 in LineSplitter, we have: self.delimiter = delimiter.strip() or None so if I pass in, say, '\t' as the delimiter, self.delimiter gets set to None, which then causes the default behavior of any-whitespace-is- delimiter to be used. This makes lines like "Gene Name\tPubMed ID \tStarting Position" get split wrong, even when I explicitly pass in '\t' as the delimiter! Similarly, I believe that some of the tests are formulated wrong: def test_nodelimiter(self): "Test LineSplitter w/o delimiter" strg = " 1 2 3 4 5 # test" test = LineSplitter(' ')(strg) assert_equal(test, ['1', '2', '3', '4', '5']) I think that treating an explicitly-passed-in ' ' delimiter as identical to 'no delimiter' is a bad idea. If I say that ' ' is the delimiter, or '\t' is the delimiter, this should be treated *just* like ',' being the delimiter, where the expected output is: ['1', '2', '3', '4', '', '5'] At least, that's what I would expect. Treating contiguous blocks of whitespace as single delimiters is perfectly reasonable when None is provided as the delimiter, but when an explicit delimiter has been provided, it strikes me that the code shouldn't try to further- interpret it... Does anyone else have any opinion here? Zach On Dec 1, 2008, at 1:21 PM, Pierre GM wrote: > Well, looks like the attachment is too big, so here's the > implementation. The tests will come in another message. > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From rmay31 at gmail.com Tue Dec 2 14:56:26 2008 From: rmay31 at gmail.com (Ryan May) Date: Tue, 02 Dec 2008 13:56:26 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Message-ID: <493592EA.8050205@gmail.com> Zachary Pincus wrote: > Specifically, on line 115 in LineSplitter, we have: > self.delimiter = delimiter.strip() or None > so if I pass in, say, '\t' as the delimiter, self.delimiter gets set > to None, which then causes the default behavior of any-whitespace-is- > delimiter to be used. This makes lines like "Gene Name\tPubMed ID > \tStarting Position" get split wrong, even when I explicitly pass in > '\t' as the delimiter! > > Similarly, I believe that some of the tests are formulated wrong: > def test_nodelimiter(self): > "Test LineSplitter w/o delimiter" > strg = " 1 2 3 4 5 # test" > test = LineSplitter(' ')(strg) > assert_equal(test, ['1', '2', '3', '4', '5']) > > I think that treating an explicitly-passed-in ' ' delimiter as > identical to 'no delimiter' is a bad idea. If I say that ' ' is the > delimiter, or '\t' is the delimiter, this should be treated *just* > like ',' being the delimiter, where the expected output is: > ['1', '2', '3', '4', '', '5'] > > At least, that's what I would expect. Treating contiguous blocks of > whitespace as single delimiters is perfectly reasonable when None is > provided as the delimiter, but when an explicit delimiter has been > provided, it strikes me that the code shouldn't try to further- > interpret it... > > Does anyone else have any opinion here? I agree. If the user explicity passes something as a delimiter, we should use it and not try to be too smart. +1 Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From robert.kern at gmail.com Tue Dec 2 15:01:01 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 2 Dec 2008 14:01:01 -0600 Subject: [Numpy-discussion] [F2PY] Fortran call fails in IDLE / PyScripter In-Reply-To: <49354587.30906@onera.fr> References: <49354587.30906@onera.fr> Message-ID: <3d375d730812021201s3a0115d3mbefbe89664c6ef8e@mail.gmail.com> On Tue, Dec 2, 2008 at 08:26, Christophe Chappet wrote: > Hi all, > I compile the followinq code using "f2py -c --fcompiler=gnu95 > --compiler=mingw32" -m hello > subroutine AfficheMessage(szText) > character szText*100 > write (*,*) szText > return > end > > Using python console : >>>>import hello >>>>hello.affichemessage(" > Hello") > works fine ! > > I do the same in the program window of IDLE and : > - no message is displayed. > - the shell restart (or IDLE crah if launched with -n) > > Same problem with PyScripter IDE. (crash). What version of gfortran are you using (i.e. exactly which binary did you download)? I'm not sure about the crash, but I can say that you will never get the output from a write statement inside the Fortran code to go to the IDLE prompt or PyScripter's window. They are not real terminals and do not capture text going to the process's real STDOUT file pointer. They simply change the sys.stdout object to capture text printed from Python. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From rmay31 at gmail.com Tue Dec 2 15:12:04 2008 From: rmay31 at gmail.com (Ryan May) Date: Tue, 02 Dec 2008 14:12:04 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Message-ID: <49359694.8080605@gmail.com> Pierre GM wrote: > Well, looks like the attachment is too big, so here's the > implementation. The tests will come in another message. A couple of quick nitpicks: 1) On line 186 (in the NameValidator class), you use excludelist.append() to append a list to the end of a list. I think you meant to use excludelist.extend() 2) When validating a list of names, why do you insist on lower casing them? (I'm referring to the call to lower() on line 207). On one hand, this would seem nicer than all upper case, but on the other hand this can cause confusion for someone who sees certain casing of names in the file and expects that data to be laid out the same. Other than those, it's working fine for me here. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From h5py at alfven.org Tue Dec 2 16:30:47 2008 From: h5py at alfven.org (Andrew Collette) Date: Tue, 02 Dec 2008 13:30:47 -0800 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 In-Reply-To: <1cd32cbb0812011853r89e582dl2d65ac953c3983dc@mail.gmail.com> References: <1228182826.24243.1.camel@tachyon-laptop> <1cd32cbb0812011853r89e582dl2d65ac953c3983dc@mail.gmail.com> Message-ID: <1228253447.6348.12.camel@tachyon-laptop> Just FYI, the Windows installer for 1.0 is now posted at h5py.googlecode.com after undergoing some final testing. Thanks for trying 0.3.0... too bad about matlab. Andrew On Mon, 2008-12-01 at 21:53 -0500, josef.pktd at gmail.com wrote: > >Requires > >-------- > > > >* UNIX-like platform (Linux or Mac OS-X); > >Windows version is in progress > > > I installed version 0.3.0 back in August on WindowsXP, and as far as I > remember there were no problems at all with the install, and all tests > pass. > > I thought the interface was really easy to use. > But after trying it out I realized that my matlab is too old to > understand the generated hdf5 files in an easy-to-use way, and I had > to go back to csv-files. > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From pgmdevlist at gmail.com Tue Dec 2 16:48:26 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 16:48:26 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <49359694.8080605@gmail.com> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> Message-ID: On Dec 2, 2008, at 3:12 PM, Ryan May wrote: > Pierre GM wrote: >> Well, looks like the attachment is too big, so here's the >> implementation. The tests will come in another message. > > A couple of quick nitpicks: > > 1) On line 186 (in the NameValidator class), you use > excludelist.append() to append a list to the end of a list. I think > you > meant to use excludelist.extend() Good call. > 2) When validating a list of names, why do you insist on lower casing > them? (I'm referring to the call to lower() on line 207). On one > hand, > this would seem nicer than all upper case, but on the other hand this > can cause confusion for someone who sees certain casing of names in > the > file and expects that data to be laid out the same. I recall a life where names were case-insensitives, so 'dates' and 'Dates' and 'DATES' were the same field. It should be easy enough to get rid of that limitations, or add a parameter for case-sensitivity On Dec 2, 2008, at 2:47 PM, Zachary Pincus wrote: > Specifically, on line 115 in LineSplitter, we have: > self.delimiter = delimiter.strip() or None > so if I pass in, say, '\t' as the delimiter, self.delimiter gets set > to None, which then causes the default behavior of any-whitespace-is- > delimiter to be used. This makes lines like "Gene Name\tPubMed ID > \tStarting Position" get split wrong, even when I explicitly pass in > '\t' as the delimiter! OK, I'll check that. > > I think that treating an explicitly-passed-in ' ' delimiter as > identical to 'no delimiter' is a bad idea. If I say that ' ' is the > delimiter, or '\t' is the delimiter, this should be treated *just* > like ',' being the delimiter, where the expected output is: > ['1', '2', '3', '4', '', '5'] > Valid point. Well, all, stay tuned for yet another "yet another implementation..." > > Other than those, it's working fine for me here. > > Ryan From mail at stevesimmons.com Tue Dec 2 16:53:15 2008 From: mail at stevesimmons.com (Stephen Simmons) Date: Tue, 02 Dec 2008 22:53:15 +0100 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 In-Reply-To: <1228182826.24243.1.camel@tachyon-laptop> References: <1228182826.24243.1.camel@tachyon-laptop> Message-ID: <4935AE4B.6050606@stevesimmons.com> Do you have any plans to add lzo compression support, in addition to gzip? This is a feature I used a lot in PyTables. Andrew Collette wrote: > ===================================== > Announcing HDF5 for Python (h5py) 1.0 > ===================================== > > What is h5py? > ------------- > > HDF5 for Python (h5py) is a general-purpose Python interface to the > Hierarchical Data Format library, version 5. HDF5 is a versatile, > mature scientific software library designed for the fast, flexible > storage of enormous amounts of data. > > > From Chris.Barker at noaa.gov Tue Dec 2 17:36:10 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 02 Dec 2008 14:36:10 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> Message-ID: <4935B85A.6030701@noaa.gov> Pierre GM wrote: >> I think that treating an explicitly-passed-in ' ' delimiter as >> identical to 'no delimiter' is a bad idea. If I say that ' ' is the >> delimiter, or '\t' is the delimiter, this should be treated *just* >> like ',' being the delimiter, where the expected output is: >> ['1', '2', '3', '4', '', '5'] >> > > Valid point. > Well, all, stay tuned for yet another "yet another implementation..." While we're at it, it might be nice to be able to pass in more than one delimiter: ('\t',' '). though maybe that only combination that I'd really want would be something and '\n', which I think is being treated specially already. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Tue Dec 2 17:46:15 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 17:46:15 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4935B85A.6030701@noaa.gov> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> Message-ID: <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> Chris, I can try, but in that case, please write me a unittest, so that I have a clear and unambiguous idea of what you expect. ANFSCD, have you tried the missing_values option ? On Dec 2, 2008, at 5:36 PM, Christopher Barker wrote: > Pierre GM wrote: >>> I think that treating an explicitly-passed-in ' ' delimiter as >>> identical to 'no delimiter' is a bad idea. If I say that ' ' is the >>> delimiter, or '\t' is the delimiter, this should be treated *just* >>> like ',' being the delimiter, where the expected output is: >>> ['1', '2', '3', '4', '', '5'] >>> >> >> Valid point. >> Well, all, stay tuned for yet another "yet another implementation..." > > While we're at it, it might be nice to be able to pass in more than > one > delimiter: ('\t',' '). though maybe that only combination that I'd > really want would be something and '\n', which I think is being > treated > specially already. > > -Chris > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From h5py at alfven.org Tue Dec 2 18:12:20 2008 From: h5py at alfven.org (Andrew Collette) Date: Tue, 02 Dec 2008 15:12:20 -0800 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 In-Reply-To: <4935AE4B.6050606@stevesimmons.com> References: <1228182826.24243.1.camel@tachyon-laptop> <4935AE4B.6050606@stevesimmons.com> Message-ID: <1228259541.14190.8.camel@tachyon-laptop> If it's a feature people want, I certainly wouldn't mind looking in to it. I believe PyTables supports bzip2 as well. Adding filters to HDF5 takes a bit of work but is well supported by the library. Andrew On Tue, 2008-12-02 at 22:53 +0100, Stephen Simmons wrote: > Do you have any plans to add lzo compression support, in addition to > gzip? This is a feature I used a lot in PyTables. > > Andrew Collette wrote: > > ===================================== > > Announcing HDF5 for Python (h5py) 1.0 > > ===================================== > > > > What is h5py? > > ------------- > > > > HDF5 for Python (h5py) is a general-purpose Python interface to the > > Hierarchical Data Format library, version 5. HDF5 is a versatile, > > mature scientific software library designed for the fast, flexible > > storage of enormous amounts of data. > > From ggellner at uoguelph.ca Tue Dec 2 22:57:07 2008 From: ggellner at uoguelph.ca (Gabriel Gellner) Date: Tue, 2 Dec 2008 22:57:07 -0500 Subject: [Numpy-discussion] PyArray_EMPTY and Cython Message-ID: <20081203035707.GA28913@encolpuis> After some discussion on the Cython lists I thought I would try my hand at writing some Cython accelerators for empty and zeros. This will involve using PyArray_EMPTY, I have a simple prototype I would like to get working, but currently it segfaults. Any tips on what I might be missing? import numpy as np cimport numpy as np cdef extern from "numpy/arrayobject.h": PyArray_EMPTY(int ndims, np.npy_intp* dims, int type, bint fortran) cdef np.ndarray empty(np.npy_intp length): cdef np.ndarray[np.double_t, ndim=1] ret cdef int type = np.NPY_DOUBLE cdef int ndims = 1 cdef np.npy_intp* dims dims = &length print dims[0] print type ret = PyArray_EMPTY(ndims, dims, type, False) return ret def test(): cdef np.ndarray[np.double_t, ndim=1] y = empty(10) return y The code seems to print out the correct dims and type info but segfaults when the PyArray_EMPTY call is made. Thanks, Gabriel From Christophe.Chappet at onera.fr Wed Dec 3 04:10:52 2008 From: Christophe.Chappet at onera.fr (Christophe Chappet) Date: Wed, 03 Dec 2008 10:10:52 +0100 Subject: [Numpy-discussion] [F2PY] Fortran call fails in IDLE / PyScripter Message-ID: <49364D1C.8060609@onera.fr> >What version of gfortran are you using (i.e. exactly which binary did >you download)? GNU Fortran (GCC) 4.4.0 20080603 (experimental) [trunk revision 136333] >Is this a write to standard output "write (*,*) szText" ? yes, it is. I forgot to say that it also works with pydev in Eclipse but I'm looking for a simple interactive python shell that can execute some programs. IPython does the job but is less friendly than IDLE in term of program editing. Anyway, I think I will use it for now. Thanks for your reply. Regards, Christophe On Tue, Dec 2, 2008 at 08:26, Christophe Chappet > wrote: >/ Hi all, />/ I compile the followinq code using "f2py -c --fcompiler=gnu95 />/ --compiler=mingw32" -m hello />/ subroutine AfficheMessage(szText) />/ character szText*100 />/ write (*,*) szText />/ return />/ end />/ />/ Using python console : />>>>/import hello />>>>/hello.affichemessage(" />/ Hello") />/ works fine ! />/ />/ I do the same in the program window of IDLE and : />/ - no message is displayed. />/ - the shell restart (or IDLE crah if launched with -n) />/ />/ Same problem with PyScripter IDE. (crash)./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barthelemy at crans.org Wed Dec 3 09:19:29 2008 From: barthelemy at crans.org (=?ISO-8859-1?Q?S=E9bastien_Barth=E9lemy?=) Date: Wed, 3 Dec 2008 15:19:29 +0100 Subject: [Numpy-discussion] trouble subclassing ndarray Message-ID: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> Hello, I'm trying to write a small library of differential geometry, and I have some trouble subclassing ndarray. I'd like an HomogeneousMatrix class that subclasse ndarray and overloads some methods, such as inv(). Here is my first try, the inv() function and the inv_v1() method work as expected, but the inv_v2() and inv_v3() methods do not change the object at all. Can somebody explain me what is happening here ? import numpy as np def inv(H): """ inverse of an homogeneous matrix """ R = H[0:3,0:3] p = H[0:3,3:4] return np.vstack( (np.hstack((R.T,-np.dot(R.T,p))), [0,0,0,1])) class HomogeneousMatrix(np.ndarray): def __new__(subtype, data=np.eye(4)): subarr = np.array(data) if htr.ishomogeneousmatrix(subarr): return subarr.view(subtype) else: raise ValueError def inv_v1(self): self[0:4,0:4] = htr.inv(self) def inv_v2(self): data = htr.inv(self) self = HomogeneousMatrix(data) def inv_v3(self): self = htr.inv(self) Thank you ! -- S?bastien From silva at lma.cnrs-mrs.fr Wed Dec 3 10:24:43 2008 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Wed, 03 Dec 2008 16:24:43 +0100 Subject: [Numpy-discussion] trouble subclassing ndarray In-Reply-To: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> References: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> Message-ID: <1228317884.2947.7.camel@Portable-s2m.cnrs-mrs.fr> Le mercredi 03 d?cembre 2008, S?bastien Barth?lemy a ?crit : > Hello, Hi Sebastien! > I'm trying to write a small library of differential geometry, and I > have some trouble subclassing ndarray. > I'd like an HomogeneousMatrix class that subclasse ndarray and > overloads some methods, such as inv(). > Here is my first try, the inv() function and the inv_v1() method work > as expected, but the inv_v2() and inv_v3() methods do not change the > object at all. Can somebody explain me what is happening here ? > > import numpy as np > def inv(H): > """ > inverse of an homogeneous matrix > """ > R = H[0:3,0:3] > p = H[0:3,3:4] > return np.vstack( (np.hstack((R.T,-np.dot(R.T,p))), [0,0,0,1])) > > class HomogeneousMatrix(np.ndarray): > def __new__(subtype, data=np.eye(4)): > subarr = np.array(data) > if htr.ishomogeneousmatrix(subarr): > return subarr.view(subtype) > else: > raise ValueError > def inv_v1(self): > self[0:4,0:4] = htr.inv(self) > def inv_v2(self): > data = htr.inv(self) > self = HomogeneousMatrix(data) > def inv_v3(self): > self = htr.inv(self) There is something I missed: what is htr? I guess htr.inv is the inv function defined before the class. Another point: it seems weird to me that, in the class' methods inv_v2 and inv_v3, you 'unref' the previous instance of HomogeneousMatrix and link the 'self' label to a new instance... In inv_v1, you just modify the coefficient of the Homogeneous Matrix with the coefficient of htr.inv(self) -- Fabrice Silva LMA UPR CNRS 7051 - ?quipe S2M From bioinformed at gmail.com Wed Dec 3 10:32:19 2008 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 3 Dec 2008 10:32:19 -0500 Subject: [Numpy-discussion] trouble subclassing ndarray In-Reply-To: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> References: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> Message-ID: <2e1434c10812030732mac096d2x331ec2e989c42d@mail.gmail.com> On Wed, Dec 3, 2008 at 9:19 AM, S?bastien Barth?lemy wrote: > def inv_v1(self): > self[0:4,0:4] = htr.inv(self) > def inv_v2(self): > data = htr.inv(self) > self = HomogeneousMatrix(data) > def inv_v3(self): > self = htr.inv(self) > self is a reference, so you're just overwriting it with references to new values in v2 and v3. The original object is unchanged. Only v1 changes self. You may want to use "self[:] = ....". -Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From barthelemy at crans.org Wed Dec 3 10:43:59 2008 From: barthelemy at crans.org (=?ISO-8859-1?Q?S=E9bastien_Barth=E9lemy?=) Date: Wed, 3 Dec 2008 16:43:59 +0100 Subject: [Numpy-discussion] trouble subclassing ndarray In-Reply-To: <2e1434c10812030732mac096d2x331ec2e989c42d@mail.gmail.com> References: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> <2e1434c10812030732mac096d2x331ec2e989c42d@mail.gmail.com> Message-ID: <78f7ab620812030743o707d822p679242ef25828046@mail.gmail.com> 2008/12/3 Kevin Jacobs : > On Wed, Dec 3, 2008 at 9:19 AM, S?bastien Barth?lemy > wrote: >> >> def inv_v1(self): >> self[0:4,0:4] = htr.inv(self) >> def inv_v2(self): >> data = htr.inv(self) >> self = HomogeneousMatrix(data) >> def inv_v3(self): >> self = htr.inv(self) > > self is a reference, so you're just overwriting it with references to new > values in v2 and v3. The original object is unchanged. Only v1 changes > self. You may want to use "self[:] = ....". okay, it seems obvious now. I definitely spent to much time with matlab. Thanks From barthelemy at crans.org Wed Dec 3 10:56:42 2008 From: barthelemy at crans.org (=?ISO-8859-1?Q?S=E9bastien_Barth=E9lemy?=) Date: Wed, 3 Dec 2008 16:56:42 +0100 Subject: [Numpy-discussion] trouble subclassing ndarray In-Reply-To: <1228317884.2947.7.camel@Portable-s2m.cnrs-mrs.fr> References: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> <1228317884.2947.7.camel@Portable-s2m.cnrs-mrs.fr> Message-ID: <78f7ab620812030756p7a5df60bj22d8ef72a519a6c@mail.gmail.com> 2008/12/3 Fabrice Silva : > Le mercredi 03 d?cembre 2008, S?bastien Barth?lemy a ?crit : >> Hello, > Hi Sebastien! Hello Fabrice > There is something I missed: what is htr? I guess htr.inv is the inv > function defined before the class. yes, I cut-n-pasted the function definition from the htr module and forgot to tell it, sorry Thank you From rmay31 at gmail.com Wed Dec 3 11:41:56 2008 From: rmay31 at gmail.com (Ryan May) Date: Wed, 03 Dec 2008 10:41:56 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> Message-ID: <4936B6D4.2060405@gmail.com> Pierre GM wrote: >> I think that treating an explicitly-passed-in ' ' delimiter as >> identical to 'no delimiter' is a bad idea. If I say that ' ' is the >> delimiter, or '\t' is the delimiter, this should be treated *just* >> like ',' being the delimiter, where the expected output is: >> ['1', '2', '3', '4', '', '5'] >> > > Valid point. > Well, all, stay tuned for yet another "yet another implementation..." > Found a problem. If you read the names from the file and specify usecols, you end up with the first N names read from the file as the fields in your output (where N is the number of entries in usecols), instead of having the names of the columns you asked for. For instance: >>>from StringIO import StringIO >>>from genload_proposal import loadtxt >>>f = StringIO('stid stnm relh tair\nnrmn 121 45 9.1') >>>loadtxt(f, usecols=('stid', 'relh', 'tair'), names=True, dtype=None) array(('nrmn', 45, 9.0999999999999996), dtype=[('stid', '|S4'), ('stnm', ' From pgmdevlist at gmail.com Wed Dec 3 12:08:15 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 12:08:15 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936B6D4.2060405@gmail.com> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> Message-ID: On Dec 3, 2008, at 11:41 AM, Ryan May wrote: > > Found a problem. If you read the names from the file and specify > usecols, you end up with the first N names read from the file as the > fields in your output (where N is the number of entries in usecols), > instead of having the names of the columns you asked for. > > <..> > > I've attached a version that fixes this by setting a flag internally > if the names are read from the file. If this flag is true, at the > end the names are filtered down to only the ones that are given in > usecols. OK, thx. I'll take that into account and post a new version by the end of the day. > > I also have one other thought. Is there any way we can make this > handle object arrays, or rather, a field containing objects, > specifically datetime objects? Right now, this does not work > because calling view does not work for object arrays. I'm just > looking for a simple way to store date/time in my record array > (currently a string field). It does already: you can upgrade the mapper of StringConverter to support datetime object. Check an earlier post by JDH and my answer. I'll add an example in the test suite. From aisaac at american.edu Wed Dec 3 12:32:01 2008 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 03 Dec 2008 12:32:01 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> Message-ID: <4936C291.5090306@american.edu> If I know my data is already clean and is handled nicely by the old loadtxt, will I be able to turn off and the special handling in order to retain the old load speed? Alan Isaac From Chris.Barker at noaa.gov Wed Dec 3 12:48:16 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 09:48:16 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> Message-ID: <4936C660.5040906@noaa.gov> Pierre GM wrote: > I can try, but in that case, please write me a unittest, so that I > have a clear and unambiguous idea of what you expect. fair enough, though I'm not sure when I'll have time to do it. I do wonder if anyone else thinks it would be useful to have multiple delimiters as an option. I got the idea because with fromfile(), if you specify, say ',' as the delimiter, it won't use '\n', only a comma, so there is no way to quickly read a whole bunch of comma delimited data like: 1,2,3,4 5,6,7,8 .... so I'd like to be able to say to use either ',' or '\n' as the delimiter. However, if I understand loadtxt() correctly, it's handling the new lines separately anyway (to get a 2-d array), so this use case isn't an issue. So how likely is it that someone would have: 1 2 3, 4, 5 6 7 8, 8, 9 and want to read that into a single 2-d array? I'm not sure I've seen it. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Wed Dec 3 12:58:30 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 12:58:30 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C660.5040906@noaa.gov> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> Message-ID: <6DBB8759-CEFD-41B7-8D9E-1EF7ACB5C859@gmail.com> On Dec 3, 2008, at 12:48 PM, Christopher Barker wrote: > Pierre GM wrote: >> I can try, but in that case, please write me a unittest, so that I >> have a clear and unambiguous idea of what you expect. > > fair enough, though I'm not sure when I'll have time to do it. Oh, don;t worry, nothing too fancy: give me a couple lines of input data and a line with what you expect. Using Ryan's recent example: >>>f = StringIO('stid stnm relh tair\nnrmn 121 45 9.1') >>> test = loadtxt(f, usecols=('stid', 'relh', 'tair'), names=True, dtype=None) >>> control=array(('nrmn', 45, 9.0999999999999996), dtype=[('stid', '|S4'), ('relh', ' I do wonder if anyone else thinks it would be useful to have multiple > delimiters as an option. I got the idea because with fromfile(), if > you > specify, say ',' as the delimiter, it won't use '\n', only a comma, > so > there is no way to quickly read a whole bunch of comma delimited > data like: > > 1,2,3,4 > 5,6,7,8 > .... > > so I'd like to be able to say to use either ',' or '\n' as the > delimiter. I'm not quite sure I follow you. Do you want to delimiters, one for the field of a record (','), one for the records ("\n") ? > > However, if I understand loadtxt() correctly, it's handling the new > lines separately anyway (to get a 2-d array), so this use case isn't > an > issue. So how likely is it that someone would have: > > 1 2 3, 4, 5 > 6 7 8, 8, 9 > > and want to read that into a single 2-d array? With the current behaviour, you gonna have [("1 2 3", 4, 5), ("6 7 8", 8, 9)] if you use "," as a delimiter, [(1,2,"3,","4,",5),(6,7,"8,","8,",9)] if you use " " as a delimiter. Mixing delimiter is doable, but I don't think it's that a good idea. I'm in favor of sticking to one and only field delimiter, and the default line spearator for record delimiter. In other terms, not changing anythng. From pgmdevlist at gmail.com Wed Dec 3 12:59:38 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 12:59:38 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C291.5090306@american.edu> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> Message-ID: <59E58280-9EFC-4AC9-A2F4-DC9B7B50FF17@gmail.com> On Dec 3, 2008, at 12:32 PM, Alan G Isaac wrote: > If I know my data is already clean > and is handled nicely by the > old loadtxt, will I be able to turn > off and the special handling in > order to retain the old load speed? Hopefully. I'm looking for the best way to do it. Do you have an example you could send me off-list so that I can play with timers ? Thx in advance. P. From Chris.Barker at noaa.gov Wed Dec 3 13:00:58 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 10:00:58 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C660.5040906@noaa.gov> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> Message-ID: <4936C95A.3070706@noaa.gov> by the way, should this work: io.loadtxt('junk.dat', delimiter=' ') for more than one space between numbers, like: 1 2 3 4 5 6 7 8 9 10 I get: io.loadtxt('junk.dat', delimiter=' ') Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/io.py", line 403, in loadtxt X.append(tuple([conv(val) for (conv, val) in zip(converters, vals)])) ValueError: empty string for float() with the current version. >>> io.loadtxt('junk.dat', delimiter=None) array([[ 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10.]]) does work. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Dec 3 13:14:02 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 10:14:02 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <6DBB8759-CEFD-41B7-8D9E-1EF7ACB5C859@gmail.com> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> <6DBB8759-CEFD-41B7-8D9E-1EF7ACB5C859@gmail.com> Message-ID: <4936CC6A.4010409@noaa.gov> Pierre GM wrote: > Oh, don;t worry, nothing too fancy: give me a couple lines of input > data and a line with what you expect. I just went and looked at the existing tests, and you're right, it's very easy -- my first foray into the new nose tests -- very nice! >> specify, say ',' as the delimiter, it won't use '\n', only a comma, >> so >> there is no way to quickly read a whole bunch of comma delimited >> data like: >> >> 1,2,3,4 >> 5,6,7,8 >> .... >> >> so I'd like to be able to say to use either ',' or '\n' as the >> delimiter. > > I'm not quite sure I follow you. > Do you want to delimiters, one for the field of a record (','), one > for the records ("\n") ? well, in the case of fromfile(), it doesn't "do" records -- it will only give you a 1-d array, so I want it all as a flat array, and you can re-size it yourself later. Clearly this is more work (and requires more knowledge of your data) than using loadtxt, but sometimes I really want FAST data reading of simple formats. However, this isn't fromfile() we are talking about now, it's loadtxt()... >> So how likely is it that someone would have: >> >> 1 2 3, 4, 5 >> 6 7 8, 8, 9 >> >> and want to read that into a single 2-d array? > > With the current behaviour, you gonna have > [("1 2 3", 4, 5), ("6 7 8", 8, 9)] if you use "," as a delimiter, > [(1,2,"3,","4,",5),(6,7,"8,","8,",9)] if you use " " as a delimiter. right. > Mixing delimiter is doable, but I don't think it's that a good idea. I can't come up with a use case at this point, so.. > I'm in favor of sticking to one and only field delimiter, and the > default line spearator for record delimiter. In other terms, not > changing anything. I agree -- sorry for the noise! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Dec 3 13:19:58 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 10:19:58 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C291.5090306@american.edu> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> Message-ID: <4936CDCE.6020305@noaa.gov> Alan G Isaac wrote: > If I know my data is already clean > and is handled nicely by the > old loadtxt, will I be able to turn > off and the special handling in > order to retain the old load speed? what I'd like to see is a version of loadtxt built on a slightly enhanced fromfile() -- that would be blazingly fast for the easy cases (simple tabular data of one dtype). I don't know if the special-casing should be automatic, or just have it be a separate function. Also, fromfile() needs some work, and it needs to be done in C, which is less fun, so who knows when it will get done. As I think about it, maybe what I really want is a simple version of loadtxt written in C: It would only handle one data type at a time. It would support simple comment lines. It would only support one delimiter (plus newline). It would create a 2-d array from normal, tabular data. You could specify: how many numbers you wanted, or how many rows, or read 'till EOF Actually, this is a lot like matlab's fscanf() someday.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Wed Dec 3 13:52:30 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 13:52:30 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C95A.3070706@noaa.gov> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> <4936C95A.3070706@noaa.gov> Message-ID: On Dec 3, 2008, at 1:00 PM, Christopher Barker wrote: > by the way, should this work: > > io.loadtxt('junk.dat', delimiter=' ') > > for more than one space between numbers, like: > > 1 2 3 4 5 > 6 7 8 9 10 On the version I'm working on, both delimiter='' and delimiter=None (default) would give you the expected output. delimiter=' ' would fail, delimiter=' ' would work. From mmetz at astro.uni-bonn.de Wed Dec 3 14:08:04 2008 From: mmetz at astro.uni-bonn.de (Manuel Metz) Date: Wed, 03 Dec 2008 20:08:04 +0100 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C291.5090306@american.edu> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> Message-ID: <4936D914.4020100@astro.uni-bonn.de> Alan G Isaac wrote: > If I know my data is already clean > and is handled nicely by the > old loadtxt, will I be able to turn > off and the special handling in > order to retain the old load speed? > > Alan Isaac > Hi all, that's going in the same direction I was thinking about. When I thought about an improved version of loadtxt, I wished it was fault tolerant without loosing too much performance. So my solution was much simpler than the very nice genloadtxt function -- and it works for me. My ansatz is to leave the existing loadtxt function unchanged. I only replaced the default converter calls by a fault tolerant converter class. I attached a patch against io.py in numpy 1.2.1 The nice thing is that it not only handles missing values, but for example also columns/fields with non-number characters. It just returns nan in these cases. This is of practical importance for many datafiles of astronomical catalogues, for example the Hipparcos catalogue data. Regarding the performance, it is a little bit slower than the original loadtxt, but not much: on my machine, 10x reading in a clean testfile with 3 columns and 20000 rows I get the following results: original loadtxt: ~1.3s modified loadtxt: ~1.7s new genloadtxt : ~2.7s So you see, there is some loss of performance, but not as much as with the new converter class. I hope this solution is of interest ... Manuel -------------- next part -------------- A non-text attachment was scrubbed... Name: io.diff Type: text/x-patch Size: 678 bytes Desc: not available URL: From Chris.Barker at noaa.gov Wed Dec 3 14:12:49 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 11:12:49 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> <4936C95A.3070706@noaa.gov> Message-ID: <4936DA31.4070608@noaa.gov> Pierre GM wrote: > On Dec 3, 2008, at 1:00 PM, Christopher Barker wrote: >> for more than one space between numbers, like: >> >> 1 2 3 4 5 >> 6 7 8 9 10 > > > On the version I'm working on, both delimiter='' and delimiter=None > (default) would give you the expected output. so empty string and None both mean "any white space"? also tabs, etc? > delimiter=' ' would fail, s only exactly that delimiter. Is that so things like '\t' will work right? but what about: 4, 5, 34,123, .... In that case, ',' is the delimiter, but whitespace is ignored. or 4\t 5\t 34\t 123. we're ignoring extra whitespace there, too, so I'm not sure why we shouldn't ignore it in the ' ' case also. delimiter=' ' would work. but in my example, there were sometimes two spaces, sometimes three -- so I think it would fail, no? >>> "1 2 3 4 5".split(' ') ['1', '2', '3', '4', ' 5'] actually, that would work, but four spaces wouldn't. >>> "1 2 3 4 5".split(' ') ['1', '2', '3', '4', '', '5'] I guess the solution is to use delimiter=None in that case, and is does make sense that you can't have ' ' mean "one or more spaces", but "\t" mean "only one tab". -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From mmetz at astro.uni-bonn.de Wed Dec 3 14:12:16 2008 From: mmetz at astro.uni-bonn.de (Manuel Metz) Date: Wed, 03 Dec 2008 20:12:16 +0100 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936D914.4020100@astro.uni-bonn.de> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> <4936D914.4020100@astro.uni-bonn.de> Message-ID: <4936DA10.6020202@astro.uni-bonn.de> Manuel Metz wrote: > Alan G Isaac wrote: >> If I know my data is already clean >> and is handled nicely by the >> old loadtxt, will I be able to turn >> off and the special handling in >> order to retain the old load speed? >> >> Alan Isaac >> > > Hi all, > that's going in the same direction I was thinking about. > When I thought about an improved version of loadtxt, I wished it was > fault tolerant without loosing too much performance. > So my solution was much simpler than the very nice genloadtxt function > -- and it works for me. > > My ansatz is to leave the existing loadtxt function unchanged. I only > replaced the default converter calls by a fault tolerant converter > class. I attached a patch against io.py in numpy 1.2.1 > > The nice thing is that it not only handles missing values, but for > example also columns/fields with non-number characters. It just returns > nan in these cases. This is of practical importance for many datafiles > of astronomical catalogues, for example the Hipparcos catalogue data. > > Regarding the performance, it is a little bit slower than the original > loadtxt, but not much: on my machine, 10x reading in a clean testfile > with 3 columns and 20000 rows I get the following results: > > original loadtxt: ~1.3s > modified loadtxt: ~1.7s > new genloadtxt : ~2.7s > > So you see, there is some loss of performance, but not as much as with > the new converter class. > > I hope this solution is of interest ... > > Manuel > Oops, wrong version of the diff file. Wanted to name the class "_faulttolerantconv" ... -------------- next part -------------- A non-text attachment was scrubbed... Name: io.diff Type: text/x-patch Size: 628 bytes Desc: not available URL: From pgmdevlist at gmail.com Wed Dec 3 14:21:05 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 14:21:05 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936DA10.6020202@astro.uni-bonn.de> References:

<36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> <4936D914.4020100@astro.uni-bonn.de> <4936DA10.6020202@astro.uni-bonn.de> Message-ID: <62459D5C-D2D8-4325-B82F-65BF222B5F0B@gmail.com> Manuel, Looks nice, I gonna try to see how I can incorporate yours. Note that returning np.nan by default will not work w/ Python 2.6 if you want an int... From elfnor at gmail.com Wed Dec 3 19:06:28 2008 From: elfnor at gmail.com (Elfnor) Date: Wed, 3 Dec 2008 16:06:28 -0800 (PST) Subject: [Numpy-discussion] Apply a function to an array elementwise Message-ID: <20823768.post@talk.nabble.com> Hi I want to apply a function (myfunc which takes and returns a scalar) to each element in a multi-dimensioned array (data): I can do this: newdata = numpy.array([myfunc(d) for d in data.flat]).reshape(data.shape) But I'm wondering if there's a faster more numpy way. I've looked at the vectorize function but can't work it out. thanks Eleanor -- View this message in context: http://www.nabble.com/Apply-a-function-to-an-array-elementwise-tp20823768p20823768.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From oliphant at enthought.com Wed Dec 3 21:22:01 2008 From: oliphant at enthought.com (Travis Oliphant) Date: Wed, 03 Dec 2008 20:22:01 -0600 Subject: [Numpy-discussion] Apply a function to an array elementwise In-Reply-To: <20823768.post@talk.nabble.com> References: <20823768.post@talk.nabble.com> Message-ID: <49373EC9.2000501@enthought.com> Elfnor wrote: > Hi > > I want to apply a function (myfunc which takes and returns a scalar) to each > element in a multi-dimensioned array (data): > > I can do this: > > newdata = numpy.array([myfunc(d) for d in data.flat]).reshape(data.shape) > > But I'm wondering if there's a faster more numpy way. I've looked at the > vectorize function but can't work it out. > > from numpy import vectorize new_func = vectorize(myfunc) newdata = new_func(data) Should work. -Travis From cournape at gmail.com Wed Dec 3 22:19:17 2008 From: cournape at gmail.com (David Cournapeau) Date: Thu, 4 Dec 2008 12:19:17 +0900 Subject: [Numpy-discussion] Compiler options for mingw? In-Reply-To: <49322C6C.6070400@ar.media.kyoto-u.ac.jp> References: <96BABBCF-EF9D-4AF7-8BE4-03685EB080B2@yale.edu> <5b8d13220811281302n756a3b95ka1c6e7287cb23ae0@mail.gmail.com> <3AE60785-BEC5-4AC8-A914-E63A940225A9@yale.edu> <49322C6C.6070400@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220812031919y714ed1c7la1cec4eba12ee980@mail.gmail.com> On Sun, Nov 30, 2008 at 3:02 PM, David Cournapeau wrote: > > No at the moment, but you can easily decompress the .exe content to get > the internal .exe (which are straight installers built by python > setup.py setup.py bdist_wininst). It should be possible to force an > architecture at install time using a command line option, but I don't > have the time ATM to support this. I needed it to help me fixing a couple of bugs for old CPU, so it ended up being implemented in the nsis script for scipy now (I will add it to numpy installers too). So from now, any newly releases of both numpy and scipy installers could be overriden: installer-name.exe /arch native -> default behavior installer-name.exe /arch nosse -> Force installation wo sse, even if SSE-cpu is detected. It does not check that the option is valid, so you can end up requesting SSE3 installer on a SSE2 CPU. But well... David From erik.tollerud at gmail.com Thu Dec 4 03:20:56 2008 From: erik.tollerud at gmail.com (Erik Tollerud) Date: Thu, 4 Dec 2008 00:20:56 -0800 Subject: [Numpy-discussion] Py3k and numpy Message-ID: I noticed that the Python 3000 final was released today... is there any sense of how long it will take to get numpy working under 3k? I would imagine it'll be a lot to adapt given the low-level change, but is the work already in progress? From pgmdevlist at gmail.com Thu Dec 4 06:51:53 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 06:51:53 -0500 Subject: [Numpy-discussion] genloadtxt: second serving Message-ID: <842491BB-9946-4221-8646-57638104623C@gmail.com> All, Here's the second round of genloadtxt. That's a tad cleaner version than the previous one, where I tried to take into account the different comments and suggestions that were posted. So, tabs should be supported and explicit whitespaces are not collapsed. FYI, in the __main__ section, you'll find 2 hotshot tests and a timeit comparison: same input, no missing data, one with genloadtxt, one with np.loadtxt and a last one with matplotlib.mlab.csv2rec. As you'll see, genloadtxt is roughly twice slower than np.loadtxt, but twice faster than csv2rec. One of the explanation for the slowness is indeed the use of classes for splitting lines and converting values. Instead of a basic function, we use the __call__ method of the class, which itself calls another function depending on the attribute values. I'd like to reduce this overhead, any suggestion is more than welcome, as usual. Anyhow: as we do need speed, I suggest we put genloadtxt somewhere in numpy.ma, with an alias recfromcsv for John, using his defaults. Unless somebody comes with a brilliant optimization. Let me know how it goes, Cheers, P. -------------- next part -------------- A non-text attachment was scrubbed... Name: _preview.py Type: text/x-python-script Size: 31694 bytes Desc: not available URL: -------------- next part -------------- From pgmdevlist at gmail.com Thu Dec 4 06:52:32 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 06:52:32 -0500 Subject: [Numpy-discussion] genloadtxt: second serving (tests) Message-ID: <317DDE01-99E6-4C1E-9675-5299CAC39CF3@gmail.com> And now for the tests: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_preview.py Type: text/x-python-script Size: 15545 bytes Desc: not available URL: -------------- next part -------------- From mmetz at astro.uni-bonn.de Thu Dec 4 07:22:33 2008 From: mmetz at astro.uni-bonn.de (Manuel Metz) Date: Thu, 04 Dec 2008 13:22:33 +0100 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <842491BB-9946-4221-8646-57638104623C@gmail.com> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> Message-ID: <4937CB89.9070405@astro.uni-bonn.de> Pierre GM wrote: > All, > Here's the second round of genloadtxt. That's a tad cleaner version than > the previous one, where I tried to take into account the different > comments and suggestions that were posted. So, tabs should be supported > and explicit whitespaces are not collapsed. > FYI, in the __main__ section, you'll find 2 hotshot tests and a timeit > comparison: same input, no missing data, one with genloadtxt, one with > np.loadtxt and a last one with matplotlib.mlab.csv2rec. > > As you'll see, genloadtxt is roughly twice slower than np.loadtxt, but > twice faster than csv2rec. One of the explanation for the slowness is > indeed the use of classes for splitting lines and converting values. > Instead of a basic function, we use the __call__ method of the class, > which itself calls another function depending on the attribute values. > I'd like to reduce this overhead, any suggestion is more than welcome, > as usual. > > Anyhow: as we do need speed, I suggest we put genloadtxt somewhere in > numpy.ma, with an alias recfromcsv for John, using his defaults. Unless > somebody comes with a brilliant optimization. Will loadtxt in that case remain as is? Or will the _faulttolerantconv class be used? mm From olivier.grisel at ensta.org Thu Dec 4 10:26:37 2008 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 4 Dec 2008 16:26:37 +0100 Subject: [Numpy-discussion] Broadcasting question Message-ID: Hi list, Suppose I have array a with dimensions (d1, d3) and array b with dimensions (d2, d3). I want to compute array c with dimensions (d1, d2) holding the squared euclidian norms of vectors in a and b with size d3. My first take was to use a python level loop: >>> from numpy import * >>> c = array([sum((a_i - b) ** 2, axis=1) for a_i in a]) But this is too slow and allocate a useless temporary list of python references. To avoid the python level loop I then tried to use broadcasting as follows: >>> c = sum((a[:,newaxis,:] - b) ** 2, axis=2) But this build a useless and huge (d1, d2, d3) temporary array that does not fit in memory for large values of d1, d2 and d3... Do you have any better idea? I would like to simulate a runtime behavior similar to: >>> c = dot(a, b.T) but for for squared euclidian norms instead of dotproducts. I can always write a the code in C and wrap it with ctypes but I wondered whether this is possible only with numpy. -- Olivier From stefan at sun.ac.za Thu Dec 4 10:53:01 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 4 Dec 2008 17:53:01 +0200 Subject: [Numpy-discussion] Broadcasting question In-Reply-To: References: Message-ID: <9457e7c80812040753g1640ddc7l4f15965b55bc5973@mail.gmail.com> Hi Olivier 2008/12/4 Olivier Grisel : > To avoid the python level loop I then tried to use broadcasting as follows: > >>>> c = sum((a[:,newaxis,:] - b) ** 2, axis=2) > > But this build a useless and huge (d1, d2, d3) temporary array that > does not fit in memory for large values of d1, d2 and d3... Does numpy.lib.broadcast_arrays do what you need? Regards St?fan From zachary.pincus at yale.edu Thu Dec 4 11:24:23 2008 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 4 Dec 2008 11:24:23 -0500 Subject: [Numpy-discussion] Compiler options for mingw? In-Reply-To: <5b8d13220812031919y714ed1c7la1cec4eba12ee980@mail.gmail.com> References: <96BABBCF-EF9D-4AF7-8BE4-03685EB080B2@yale.edu> <5b8d13220811281302n756a3b95ka1c6e7287cb23ae0@mail.gmail.com> <3AE60785-BEC5-4AC8-A914-E63A940225A9@yale.edu> <49322C6C.6070400@ar.media.kyoto-u.ac.jp> <5b8d13220812031919y714ed1c7la1cec4eba12ee980@mail.gmail.com> Message-ID: <530E6E8F-5375-4320-A692-6F49DA7D9B6A@yale.edu> > I needed it to help me fixing a couple of bugs for old CPU, so it > ended up being implemented in the nsis script for scipy now (I will > add it to numpy installers too). So from now, any newly releases of > both numpy and scipy installers could be overriden: > > installer-name.exe /arch native -> default behavior > installer-name.exe /arch nosse -> Force installation wo sse, even if > SSE-cpu is detected. > > It does not check that the option is valid, so you can end up > requesting SSE3 installer on a SSE2 CPU. But well... Cool! Thanks! This will be really useful... Zach From charlesr.harris at gmail.com Thu Dec 4 11:39:24 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Dec 2008 09:39:24 -0700 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: Message-ID: On Thu, Dec 4, 2008 at 1:20 AM, Erik Tollerud wrote: > I noticed that the Python 3000 final was released today... is there > any sense of how long it will take to get numpy working under 3k? I > would imagine it'll be a lot to adapt given the low-level change, but > is the work already in progress? I read that announcement too. My feeling is that we can only support one branch at a time, i.e., the python 2.x or python 3.x series. So the easiest path to 3.x looked to be waiting until python 2.6 was widely distributed, making it the required version, doing the needed updates to numpy, and then using the automatic conversion to python 3.x. I expect f2py, nose, and other tools will also need fixups. Guido suggests an approach like this for those needing to support both series and I really don't see an alternative unless someone wants to fork numpy ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Dec 4 11:53:36 2008 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 4 Dec 2008 17:53:36 +0100 Subject: [Numpy-discussion] Broadcasting question In-Reply-To: <9457e7c80812040753g1640ddc7l4f15965b55bc5973@mail.gmail.com> References: <9457e7c80812040753g1640ddc7l4f15965b55bc5973@mail.gmail.com> Message-ID: 2008/12/4 St?fan van der Walt : > Hi Olivier > > 2008/12/4 Olivier Grisel : >> To avoid the python level loop I then tried to use broadcasting as follows: >> >>>>> c = sum((a[:,newaxis,:] - b) ** 2, axis=2) >> >> But this build a useless and huge (d1, d2, d3) temporary array that >> does not fit in memory for large values of d1, d2 and d3... > > Does numpy.lib.broadcast_arrays do what you need? That looks exactly what I am looking for. Apparently this is new in 1.2 since I cannot find it in the 1.1 version of my system. Thanks, -- Olivier From charlesr.harris at gmail.com Thu Dec 4 11:55:26 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Dec 2008 09:55:26 -0700 Subject: [Numpy-discussion] Broadcasting question In-Reply-To: References: Message-ID: On Thu, Dec 4, 2008 at 8:26 AM, Olivier Grisel wrote: > Hi list, > > Suppose I have array a with dimensions (d1, d3) and array b with > dimensions (d2, d3). I want to compute array c with dimensions (d1, > d2) holding the squared euclidian norms of vectors in a and b with > size d3. > Just to clarify the problem a bit, it looks like you want to compute the squared euclidean distance between every vector in a and every vector in b, i.e., a distance matrix. Is that correct? Also, how big are d1,d2,d3? If you *are* looking to compute the distance matrix I suspect your end goal is something beyond that. Could you describe what you are trying to do? I could be that scipy.spatial or scipy.cluster are what you should look at. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Thu Dec 4 12:58:52 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 12:58:52 -0500 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <4937CB89.9070405@astro.uni-bonn.de> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> <4937CB89.9070405@astro.uni-bonn.de> Message-ID: <272FAE9E-DC29-42A5-B44C-8EB16E05DF89@gmail.com> On Dec 4, 2008, at 7:22 AM, Manuel Metz wrote: > > Will loadtxt in that case remain as is? Or will the _faulttolerantconv > class be used? No idea, we need to discuss it. There's a problem with _faulttolerantconv: using np.nan as default value will not work in Python2.6 if the output is to be int, as an exception will be raised. Therefore, we'd need to change the default to something else when defining _faulttolerantconv. The easiest would be to define a class and set the argument at instantiation, but then we're going back dangerously close to StringConverter... From olivier.grisel at ensta.org Thu Dec 4 12:59:19 2008 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 4 Dec 2008 18:59:19 +0100 Subject: [Numpy-discussion] Broadcasting question In-Reply-To: References: Message-ID: 2008/12/4 Charles R Harris : > > > On Thu, Dec 4, 2008 at 8:26 AM, Olivier Grisel > wrote: >> >> Hi list, >> >> Suppose I have array a with dimensions (d1, d3) and array b with >> dimensions (d2, d3). I want to compute array c with dimensions (d1, >> d2) holding the squared euclidian norms of vectors in a and b with >> size d3. > > Just to clarify the problem a bit, it looks like you want to compute the > squared euclidean distance between every vector in a and every vector in b, > i.e., a distance matrix. Is that correct? Also, how big are d1,d2,d3? I would target d1 >> d2 ~ d3 with d1 as large as possible to fit in memory and d2 and d3 in the order of a couple hundreds or thousands for a start. > If you *are* looking to compute the distance matrix I suspect your end goal > is something beyond that. Could you describe what you are trying to do? My end goal it to compute the activation of an array of Radial Basis Function units where the activation of unit with center b_j for data vector a_i is given by: f(a_i, b_j) = exp(-||a_i - bj|| ** 2 / (2 * sigma)) The end goal is to have building blocks of various parameterized array of homogeneous units (linear, sigmoid and RBF) along with their gradient in parameter space so as too build various machine learning algorithms such as multi layer perceptrons with various training strategies such as Stochastic Gradient Descent. That code might be integrated into the Modular Data Processing (MPD toolkit) project [1] at some point. The current stat of the python code is here: http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdkernel/scalar.py You can find an SSE optimized C implementation wrapped with ctypes here: http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdkernel/sse.py http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdkernel/sse.c > It could be that scipy.spatial or scipy.cluster are what you should look at. I'll have a look at those, thanks for the pointer. [1] http://mdp-toolkit.sourceforge.net/ -- Olivier From charlesr.harris at gmail.com Thu Dec 4 13:57:23 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Dec 2008 11:57:23 -0700 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: Message-ID: On Thu, Dec 4, 2008 at 9:39 AM, Charles R Harris wrote: > > > On Thu, Dec 4, 2008 at 1:20 AM, Erik Tollerud wrote: > >> I noticed that the Python 3000 final was released today... is there >> any sense of how long it will take to get numpy working under 3k? I >> would imagine it'll be a lot to adapt given the low-level change, but >> is the work already in progress? > > > I read that announcement too. My feeling is that we can only support one > branch at a time, i.e., the python 2.x or python 3.x series. So the easiest > path to 3.x looked to be waiting until python 2.6 was widely distributed, > making it the required version, doing the needed updates to numpy, and then > using the automatic conversion to python 3.x. I expect f2py, nose, and other > tools will also need fixups. Guido suggests an approach like this for those > needing to support both series and I really don't see an alternative unless > someone wants to fork numpy ;) > Looks like python 2.6 just went into Fedora rawhide, so it should be in the May Fedora 11 release. I expect Ubuntu and other leading edge Linux distros to have it about the same time. This probably means numpy needs to be running on python 2.6 by early Spring. Dropping support for earlier versions of python might be something to look at for next Fall. So I'm guessing about a year will be the earliest we might have Python 3.0 support. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Dec 4 14:03:01 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 4 Dec 2008 13:03:01 -0600 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: Message-ID: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> On Thu, Dec 4, 2008 at 12:57, Charles R Harris wrote: > > > On Thu, Dec 4, 2008 at 9:39 AM, Charles R Harris > wrote: >> >> >> On Thu, Dec 4, 2008 at 1:20 AM, Erik Tollerud >> wrote: >>> >>> I noticed that the Python 3000 final was released today... is there >>> any sense of how long it will take to get numpy working under 3k? I >>> would imagine it'll be a lot to adapt given the low-level change, but >>> is the work already in progress? >> >> I read that announcement too. My feeling is that we can only support one >> branch at a time, i.e., the python 2.x or python 3.x series. So the easiest >> path to 3.x looked to be waiting until python 2.6 was widely distributed, >> making it the required version, doing the needed updates to numpy, and then >> using the automatic conversion to python 3.x. I expect f2py, nose, and other >> tools will also need fixups. Guido suggests an approach like this for those >> needing to support both series and I really don't see an alternative unless >> someone wants to fork numpy ;) > > Looks like python 2.6 just went into Fedora rawhide, so it should be in the > May Fedora 11 release. I expect Ubuntu and other leading edge Linux distros > to have it about the same time. This probably means numpy needs to be > running on python 2.6 by early Spring. It does. What problems are people seeing? Is it just the Windows build that causes people to say "numpy doesn't work with Python 2.6"? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at gmail.com Thu Dec 4 14:14:52 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 14:14:52 -0500 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> Message-ID: <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> On Nov 25, 2008, at 12:23 PM, Pierre GM wrote: > All, > Sorry to bump my own post, and I was kinda threadjacking anyway: > > Some functions of numy.ma (eg, ma.max, ma.min...) accept explicit > outputs that may not be MaskedArrays. > When such an explicit output is not a MaskedArray, a value that > should have been masked is transformed into np.nan. > > That worked great in 2.5, with np.nan automatically transformed to 0 > when the explicit output had a int dtype. With Python 2.6, a > ValueError is raised instead, as np.nan can no longer be casted to > int. > > What should be the recommended behavior in this case ? Raise a > ValueError or some other exception, to follow the new Python2.6 > convention, or silently replace np.nan by some value acceptable by > int dtype (0, or something else) ? Second bump, sorry. Any consensus on what the behavior should be ? Raise a ValueError (even in 2.5, therefore risking to break something) or just go with the flow and switch np.nan to an acceptable value (like 0), under the hood ? I'd like to close the corresponding ticket... From millman at berkeley.edu Thu Dec 4 14:40:56 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Thu, 4 Dec 2008 11:40:56 -0800 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> Message-ID: On Thu, Dec 4, 2008 at 11:14 AM, Pierre GM wrote: > Raise a ValueError (even in 2.5, therefore risking to break something) +1 -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From tgrav at mac.com Thu Dec 4 15:15:56 2008 From: tgrav at mac.com (Tommy Grav) Date: Thu, 04 Dec 2008 15:15:56 -0500 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: <63B78B1C-C6A4-4575-A242-463A932FCBE9@me.com> References: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> <63B78B1C-C6A4-4575-A242-463A932FCBE9@me.com> Message-ID: On Dec 4, 2008, at 2:03 PM, Robert Kern wrote: > It does. What problems are people seeing? Is it just the Windows build > that causes people to say "numpy doesn't work with Python 2.6"? There is currently no official Mac OSX binary for numpy for python 2.6, but you can build it from source. Is there any time table for generating a 2.6 Mac OS X binary? Cheers Tommy From josef.pktd at gmail.com Thu Dec 4 15:24:21 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Dec 2008 15:24:21 -0500 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> Message-ID: <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> On Thu, Dec 4, 2008 at 2:40 PM, Jarrod Millman wrote: > On Thu, Dec 4, 2008 at 11:14 AM, Pierre GM wrote: >> Raise a ValueError (even in 2.5, therefore risking to break something) > > +1 > +1 I'm not yet a serious user of numpy/scipy, but when debugging the discrete distributions, it took me a while to figure out that some mysteriously appearing zeros were nans that were silently converted during casting to int. In matlab, I encode different types of missing values (in the data) by numbers that I know are not in my dataset, e.g -2**20, -2**21,... but that depends on the dataset. (hand made nan handling, before data is cleaned). When I see then a "weird" number, I know that there is a problem, if it the nan is zero, I wouldn't know if it's a missing value or really a zero. Josef From millman at berkeley.edu Thu Dec 4 15:29:55 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Thu, 4 Dec 2008 12:29:55 -0800 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> <63B78B1C-C6A4-4575-A242-463A932FCBE9@me.com> Message-ID: On Thu, Dec 4, 2008 at 12:15 PM, Tommy Grav wrote: > On Dec 4, 2008, at 2:03 PM, Robert Kern wrote: >> It does. What problems are people seeing? Is it just the Windows build >> that causes people to say "numpy doesn't work with Python 2.6"? > > There is currently no official Mac OSX binary for numpy for python 2.6, > but you can build it from source. Is there any time table for generating > a 2.6 Mac OS X binary? My intention was to make 2.6 Mac binaries for the NumPy 1.3 release. We haven't finalized a timetable for the 1.3 release yet, but the current plan was to try and get the release out near the end of December. Once SciPy 0.7 is out, I will turn my attention to the next NumPy release. -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From pgmdevlist at gmail.com Thu Dec 4 15:27:15 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 15:27:15 -0500 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> Message-ID: On Dec 4, 2008, at 3:24 PM, josef.pktd at gmail.com wrote: > On Thu, Dec 4, 2008 at 2:40 PM, Jarrod Millman > wrote: >> On Thu, Dec 4, 2008 at 11:14 AM, Pierre GM >> wrote: >>> Raise a ValueError (even in 2.5, therefore risking to break >>> something) >> >> +1 >> > > +1 OK then, I'll do that and update the SVN later tonight or early tmw... From rmay31 at gmail.com Thu Dec 4 15:54:28 2008 From: rmay31 at gmail.com (Ryan May) Date: Thu, 04 Dec 2008 14:54:28 -0600 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <842491BB-9946-4221-8646-57638104623C@gmail.com> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> Message-ID: <49384384.3000105@gmail.com> Pierre GM wrote: > All, > Here's the second round of genloadtxt. That's a tad cleaner version than > the previous one, where I tried to take into account the different > comments and suggestions that were posted. So, tabs should be supported > and explicit whitespaces are not collapsed. Looks pretty good, but there's one breakage against what I had working with my local copy (with mods). When adding the filtering of names read from the file using usecols, there's a reason I set a flag and fixed it later: converters specified by name. If we have usecols and converters specified by name, and we read the names from a file, we have the following sequence: 1) Read names 2) Convert usecols names to column numbers. 3) Filter name list using usecols. Indices of names list no longer map to column numbers. 4) Change converters from mapping names->funcs to mapping col#->func using indices from names....OOPS. It's an admittedly complex combination, but it allows flexibly reading text files since you're only basing on field names, no column numbers. Here's a test case: def test_autonames_usecols_and_converter(self): "Tests names and usecols" data = StringIO.StringIO('A B C D\n aaaa 121 45 9.1') test = loadtxt(data, usecols=('A', 'C', 'D'), names=True, dtype=None, converters={'C':lambda s: 2 * int(s)}) control = np.array(('aaaa', 90, 9.1), dtype=[('A', '|S4'), ('C', int), ('D', float)]) assert_equal(test, control) This fails with your current implementation, but works for me when: 1) Set a flag when reading names from header line in file 2) Filter names from file using usecols (if the flag is true) *after* remapping the converters. There may be a better approach, but this is the simplest I've come up with so far. > FYI, in the __main__ section, you'll find 2 hotshot tests and a timeit > comparison: same input, no missing data, one with genloadtxt, one with > np.loadtxt and a last one with matplotlib.mlab.csv2rec. > > As you'll see, genloadtxt is roughly twice slower than np.loadtxt, but > twice faster than csv2rec. One of the explanation for the slowness is > indeed the use of classes for splitting lines and converting values. > Instead of a basic function, we use the __call__ method of the class, > which itself calls another function depending on the attribute values. > I'd like to reduce this overhead, any suggestion is more than welcome, > as usual. > > Anyhow: as we do need speed, I suggest we put genloadtxt somewhere in > numpy.ma, with an alias recfromcsv for John, using his defaults. Unless > somebody comes with a brilliant optimization. Why only in numpy.ma and not somewhere in core numpy itself (missing values aside)? You have a pretty good masked array agnostic wrapper that IMO could go in numpy, though maybe not as loadtxt. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From Chris.Barker at noaa.gov Thu Dec 4 16:17:50 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 04 Dec 2008 13:17:50 -0800 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> Message-ID: <493848FE.4030101@noaa.gov> josef.pktd at gmail.com wrote: >>> Raise a ValueError (even in 2.5, therefore risking to break something) +1 as well > it took me a while to figure out that some > mysteriously appearing zeros were nans that were silently converted > during casting to int. and this is why -- a zero is a perfectly valid and useful number, NaN should never get cast to a zero (or any other valid number) unless the user explicitly asks it to be. I think the right choice was made for python 2.6 here. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From timmichelsen at gmx-topmail.de Thu Dec 4 17:19:04 2008 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Thu, 04 Dec 2008 23:19:04 +0100 Subject: [Numpy-discussion] Apply a function to an array elementwise In-Reply-To: <49373EC9.2000501@enthought.com> References: <20823768.post@talk.nabble.com> <49373EC9.2000501@enthought.com> Message-ID: >> I want to apply a function (myfunc which takes and returns a scalar) to each >> element in a multi-dimensioned array (data): >> >> I can do this: >> >> newdata = numpy.array([myfunc(d) for d in data.flat]).reshape(data.shape) >> >> But I'm wondering if there's a faster more numpy way. I've looked at the >> vectorize function but can't work it out. >> >> > > from numpy import vectorize > > new_func = vectorize(myfunc) > newdata = new_func(data) This seems be some sort of FAQ. Maybe the term vectorize is not known to all (newbie) users. At least finding its application in the docs doesn't seem easy. Here a more threads: * optimising single value functions for array calculations - http://article.gmane.org/gmane.comp.python.numeric.general/26543 * vectorized function inside a class - http://article.gmane.org/gmane.comp.python.numeric.general/16438 Most newcomers learn at some point to develop functions for single values (scalars) but to connect this with computation of full array and be efficient is another step. Some short note has been written on the cookbook: http://www.scipy.org/Cookbook/Autovectorize Regards, Timmie From kwmsmith at gmail.com Thu Dec 4 17:46:14 2008 From: kwmsmith at gmail.com (Kurt Smith) Date: Thu, 4 Dec 2008 16:46:14 -0600 Subject: [Numpy-discussion] PyArray_EMPTY and Cython In-Reply-To: <20081203035707.GA28913@encolpuis> References: <20081203035707.GA28913@encolpuis> Message-ID: On Tue, Dec 2, 2008 at 9:57 PM, Gabriel Gellner wrote: > After some discussion on the Cython lists I thought I would try my hand at > writing some Cython accelerators for empty and zeros. This will involve > using > PyArray_EMPTY, I have a simple prototype I would like to get working, but > currently it segfaults. Any tips on what I might be missing? I took a look at this, but I'm admittedly a cython newbie, but will be using code like this in the future. Have you had any luck? Kurt > > > import numpy as np > cimport numpy as np > > cdef extern from "numpy/arrayobject.h": > PyArray_EMPTY(int ndims, np.npy_intp* dims, int type, bint fortran) > > cdef np.ndarray empty(np.npy_intp length): > cdef np.ndarray[np.double_t, ndim=1] ret > cdef int type = np.NPY_DOUBLE > cdef int ndims = 1 > > cdef np.npy_intp* dims > dims = &length > > print dims[0] > print type > > ret = PyArray_EMPTY(ndims, dims, type, False) > > return ret > > def test(): > cdef np.ndarray[np.double_t, ndim=1] y = empty(10) > > return y > > > The code seems to print out the correct dims and type info but segfaults > when > the PyArray_EMPTY call is made. > > Thanks, > > Gabriel > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brennan.williams at visualreservoir.com Thu Dec 4 18:17:54 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 05 Dec 2008 12:17:54 +1300 Subject: [Numpy-discussion] checksum on numpy float array Message-ID: <49386522.70401@visualreservoir.com> My app reads in one or more float arrays from a binary file. Sometimes due to network timeouts etc the array is not read correctly. What would be the best way of checking the validity of the data? Would some sort of checksum approach be a good idea? Would that work with an array of floating point values? Or are checksums more for int,byte,string type data? From robert.kern at gmail.com Thu Dec 4 18:36:20 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 4 Dec 2008 17:36:20 -0600 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <49386522.70401@visualreservoir.com> References: <49386522.70401@visualreservoir.com> Message-ID: <3d375d730812041536s21627ae5xde71ff7d943c9740@mail.gmail.com> On Thu, Dec 4, 2008 at 17:17, Brennan Williams wrote: > My app reads in one or more float arrays from a binary file. > > Sometimes due to network timeouts etc the array is not read correctly. > > What would be the best way of checking the validity of the data? > > Would some sort of checksum approach be a good idea? > Would that work with an array of floating point values? > Or are checksums more for int,byte,string type data? Just use a generic hash on the file's bytes (ignoring their format). MD5 is sufficient for these purposes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Thu Dec 4 18:38:50 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Dec 2008 18:38:50 -0500 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <49386522.70401@visualreservoir.com> References: <49386522.70401@visualreservoir.com> Message-ID: <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> On Thu, Dec 4, 2008 at 6:17 PM, Brennan Williams wrote: > My app reads in one or more float arrays from a binary file. > > Sometimes due to network timeouts etc the array is not read correctly. > > What would be the best way of checking the validity of the data? > > Would some sort of checksum approach be a good idea? > Would that work with an array of floating point values? > Or are checksums more for int,byte,string type data? > If you want to verify the file itself, then python provides several more or less secure checksums, my experience was that zlib.crc32 was pretty fast on moderate file sizes. crc32 is common inside archive files and for binary newsgroups. If you have large files transported over the network, e.g. GB size, I would work with par2 repair files, which verifies and repairs at the same time. Josef From millman at berkeley.edu Thu Dec 4 18:41:40 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Thu, 4 Dec 2008 15:41:40 -0800 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <49384384.3000105@gmail.com> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> <49384384.3000105@gmail.com> Message-ID: I am not familiar with this, but it looks quite useful: http://www.stecf.org/software/PYTHONtools/astroasciidata/ or (http://www.scipy.org/AstroAsciiData) "Within the AstroAsciiData project we envision a module which can be used to work on all kinds of ASCII tables. The module provides a convenient tool such that the user easily can: * read in ASCII tables; * manipulate table elements; * save the modified ASCII table; * read and write meta data such as column names and units; * combine several tables; * delete/add rows and columns; * manage metadata in the table headers." Is anyone familiar with this package? Would make sense to investigate including this or adopting some of its interface/features? -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From brennan.williams at visualreservoir.com Thu Dec 4 18:43:58 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 05 Dec 2008 12:43:58 +1300 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> Message-ID: <49386B3E.2020207@visualreservoir.com> josef.pktd at gmail.com wrote: > On Thu, Dec 4, 2008 at 6:17 PM, Brennan Williams > wrote: > >> My app reads in one or more float arrays from a binary file. >> >> Sometimes due to network timeouts etc the array is not read correctly. >> >> What would be the best way of checking the validity of the data? >> >> Would some sort of checksum approach be a good idea? >> Would that work with an array of floating point values? >> Or are checksums more for int,byte,string type data? >> >> > > If you want to verify the file itself, then python provides several > more or less secure checksums, my experience was that zlib.crc32 was > pretty fast on moderate file sizes. crc32 is common inside archive > files and for binary newsgroups. If you have large files transported > over the network, e.g. GB size, I would work with par2 repair files, > which verifies and repairs at the same time. > > The file has multiple arrays stored in it. So I want to have some sort of validity check on just the array that I'm reading. I will need to add a check on the file as well as of course network problems could affect writing to the file as well as reading from the file. > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From cournape at gmail.com Thu Dec 4 18:45:42 2008 From: cournape at gmail.com (David Cournapeau) Date: Fri, 5 Dec 2008 08:45:42 +0900 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> References: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> Message-ID: <5b8d13220812041545u50886566n70861af77a2e4dca@mail.gmail.com> On Fri, Dec 5, 2008 at 4:03 AM, Robert Kern wrote: > On Thu, Dec 4, 2008 at 12:57, Charles R Harris > wrote: >> >> >> On Thu, Dec 4, 2008 at 9:39 AM, Charles R Harris >> wrote: >>> >>> >>> On Thu, Dec 4, 2008 at 1:20 AM, Erik Tollerud >>> wrote: >>>> >>>> I noticed that the Python 3000 final was released today... is there >>>> any sense of how long it will take to get numpy working under 3k? I >>>> would imagine it'll be a lot to adapt given the low-level change, but >>>> is the work already in progress? >>> >>> I read that announcement too. My feeling is that we can only support one >>> branch at a time, i.e., the python 2.x or python 3.x series. So the easiest >>> path to 3.x looked to be waiting until python 2.6 was widely distributed, >>> making it the required version, doing the needed updates to numpy, and then >>> using the automatic conversion to python 3.x. I expect f2py, nose, and other >>> tools will also need fixups. Guido suggests an approach like this for those >>> needing to support both series and I really don't see an alternative unless >>> someone wants to fork numpy ;) >> >> Looks like python 2.6 just went into Fedora rawhide, so it should be in the >> May Fedora 11 release. I expect Ubuntu and other leading edge Linux distros >> to have it about the same time. This probably means numpy needs to be >> running on python 2.6 by early Spring. > > It does. What problems are people seeing? Is it just the Windows build > that causes people to say "numpy doesn't work with Python 2.6"? Up to recently, numpy had some failures with python.org python 2.6 in x86 - but those are fixed now. The windows issues are mostly sorted out (and the missing information for reliable build has been integrated in python 2.6.1 I believe http://bugs.python.org/issue4365). F2py does not work, though - which is the main issue to make scipy work on 2.6, as far as I can see. David From robert.kern at gmail.com Thu Dec 4 18:52:42 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 4 Dec 2008 17:52:42 -0600 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <49386B3E.2020207@visualreservoir.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> Message-ID: <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> On Thu, Dec 4, 2008 at 17:43, Brennan Williams wrote: > josef.pktd at gmail.com wrote: >> On Thu, Dec 4, 2008 at 6:17 PM, Brennan Williams >> wrote: >> >>> My app reads in one or more float arrays from a binary file. >>> >>> Sometimes due to network timeouts etc the array is not read correctly. >>> >>> What would be the best way of checking the validity of the data? >>> >>> Would some sort of checksum approach be a good idea? >>> Would that work with an array of floating point values? >>> Or are checksums more for int,byte,string type data? >>> >>> >> >> If you want to verify the file itself, then python provides several >> more or less secure checksums, my experience was that zlib.crc32 was >> pretty fast on moderate file sizes. crc32 is common inside archive >> files and for binary newsgroups. If you have large files transported >> over the network, e.g. GB size, I would work with par2 repair files, >> which verifies and repairs at the same time. >> >> > The file has multiple arrays stored in it. > > So I want to have some sort of validity check on just the array that I'm > reading. So do it on the bytes of the individual arrays. Just don't bother implementing new type-specific checksums. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Thu Dec 4 18:57:24 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Dec 2008 18:57:24 -0500 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> Message-ID: <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> I didn't check what this does behind the scenes, but try this m = hashlib.md5() m.update(np.array(range(100))) m.update(np.array(range(200))) m2 = hashlib.md5() m2.update(np.array(range(100))) m2.update(np.array(range(200))) print m.hexdigest() print m2.hexdigest() assert m.hexdigest() == m2.hexdigest() m3 = hashlib.md5() m3.update(np.array(range(100))) m3.update(np.array(range(199))) print m3.hexdigest() assert m.hexdigest() == m3.hexdigest() Josef From josef.pktd at gmail.com Thu Dec 4 18:59:26 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Dec 2008 18:59:26 -0500 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> Message-ID: <1cd32cbb0812041559w478b621enfca53c7da36e4914@mail.gmail.com> On Thu, Dec 4, 2008 at 6:57 PM, wrote: > I didn't check what this does behind the scenes, but try this > I forgot to paste: import hashlib #standard python library Josef From brennan.williams at visualreservoir.com Thu Dec 4 19:54:44 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 05 Dec 2008 13:54:44 +1300 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> Message-ID: <49387BD4.3050501@visualreservoir.com> Thanks josef.pktd at gmail.com wrote: > I didn't check what this does behind the scenes, but try this > > import hashlib #standard python library import numpy as np > m = hashlib.md5() > m.update(np.array(range(100))) > m.update(np.array(range(200))) > > m2 = hashlib.md5() > m2.update(np.array(range(100))) > m2.update(np.array(range(200))) > > print m.hexdigest() > print m2.hexdigest() > > assert m.hexdigest() == m2.hexdigest() > > m3 = hashlib.md5() > m3.update(np.array(range(100))) > m3.update(np.array(range(199))) > > print m3.hexdigest() > > assert m.hexdigest() == m3.hexdigest() > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From robert.kern at gmail.com Thu Dec 4 20:11:20 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 4 Dec 2008 19:11:20 -0600 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <49387BD4.3050501@visualreservoir.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> <49387BD4.3050501@visualreservoir.com> Message-ID: <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> On Thu, Dec 4, 2008 at 18:54, Brennan Williams wrote: > Thanks > > josef.pktd at gmail.com wrote: >> I didn't check what this does behind the scenes, but try this >> >> > import hashlib #standard python library > import numpy as np >> m = hashlib.md5() >> m.update(np.array(range(100))) >> m.update(np.array(range(200))) I would recommend doing this on the strings before you make arrays from them. You don't know if the network cut out in the middle of an 8-byte double. Of course, sending the lengths and other metadata first, then the data would let you check without needing to do expensivish hashes or checksums. If truncation is your problem rather than corruption, then that would be sufficient. You may also consider using the NPY format in numpy 1.2 to implement that. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From brennan.williams at visualreservoir.com Thu Dec 4 21:29:08 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 05 Dec 2008 15:29:08 +1300 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> <49387BD4.3050501@visualreservoir.com> <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> Message-ID: <493891F4.4080001@visualreservoir.com> Robert Kern wrote: > On Thu, Dec 4, 2008 at 18:54, Brennan Williams > wrote: > >> Thanks >> >> josef.pktd at gmail.com wrote: >> >>> I didn't check what this does behind the scenes, but try this >>> >>> >>> >> import hashlib #standard python library >> import numpy as np >> >>> m = hashlib.md5() >>> m.update(np.array(range(100))) >>> m.update(np.array(range(200))) >>> > > I would recommend doing this on the strings before you make arrays > from them. You don't know if the network cut out in the middle of an > 8-byte double. > > Of course, sending the lengths and other metadata first, then the data > would let you check without needing to do expensivish hashes or > checksums. If truncation is your problem rather than corruption, then > that would be sufficient. You may also consider using the NPY format > in numpy 1.2 to implement that. > > Thanks for the ideas. I'm definitely going to add some more basic checks on lengths etc as well. Unfortunately the problem is happening at a client site so (a) I can't reproduce it and (b) most of the time they can't reproduce it either. This is a Windows Python app running on Citrix reading/writing data to a Linux networked drive. Brennan From dalcinl at gmail.com Thu Dec 4 21:31:37 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 4 Dec 2008 23:31:37 -0300 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: Message-ID: >From my experience working on my own projects and Cython: * the C code making Python C-API calls could be made to version-agnostic by using preprocessor macros, and even some compatibility header conditionally included. Perhaps the later would be the easiest for C-API calls (we have a lot already distilled in Cython sources). Preprocessor conditionals would still be needed when filling structs. * Regarding Python code, I believe the only sane way to go is to make the 2to3 tool to convert all the 2.x to 3.x code right. * The all-new buffer interface as implemented in core Py3.0 needs carefull review and fixes. * The now-all-strings-are-unicode is going to make some headaches ;-) * No idea how to deal with the now-all-integers-are-python-longs. On Thu, Dec 4, 2008 at 5:20 AM, Erik Tollerud wrote: > I noticed that the Python 3000 final was released today... is there > any sense of how long it will take to get numpy working under 3k? I > would imagine it'll be a lot to adapt given the low-level change, but > is the work already in progress? > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From mmetz at astro.uni-bonn.de Fri Dec 5 06:25:33 2008 From: mmetz at astro.uni-bonn.de (Manuel Metz) Date: Fri, 05 Dec 2008 12:25:33 +0100 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <272FAE9E-DC29-42A5-B44C-8EB16E05DF89@gmail.com> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> <4937CB89.9070405@astro.uni-bonn.de> <272FAE9E-DC29-42A5-B44C-8EB16E05DF89@gmail.com> Message-ID: <49390FAD.1020405@astro.uni-bonn.de> Pierre GM wrote: > On Dec 4, 2008, at 7:22 AM, Manuel Metz wrote: >> Will loadtxt in that case remain as is? Or will the _faulttolerantconv >> class be used? > > No idea, we need to discuss it. There's a problem with > _faulttolerantconv: using np.nan as default value will not work in > Python2.6 if the output is to be int, as an exception will be raised. Okay, that's something I did not check. If numpy.nan is converted to 0, it's basically useless -- 0 might be a valid number in the data and can not be distinguished from nan in that case. Here masked arrays is the only sensible approach. So the faulttolerantconv (ftc) class is applicable to floats and complex numbers only. It might nevertheless be useful to use the ftc class since (i) it results in almost no performance loss and (ii) at the same time you get at least a minimum fault tolerance, which can be very useful for many applications. I personally will switch to AstroAsciiData (thanks Jarrod for pointing this out), because that seems to be exactly what I need! Manuel > Therefore, we'd need to change the default to something else when > defining _faulttolerantconv. The easiest would be to define a class > and set the argument at instantiation, but then we're going back > dangerously close to StringConverter... From elcorto at gmx.net Fri Dec 5 10:10:41 2008 From: elcorto at gmx.net (Steve Schmerler) Date: Fri, 5 Dec 2008 16:10:41 +0100 Subject: [Numpy-discussion] subclassing ndarray Message-ID: <20081205151041.GA13217@ramrod.starsheriffs.de> Hi all I'm subclassing ndarray following [1] and I'd like to know if i'm doing it right. My goals are - ndarray subclass MyArray with additional methods - replacement for np.array, np.asarray on module level returning MyArray instances - expose new methods as functions on module level import numpy as np class MyArray(np.ndarray): def __new__(cls, arr, **kwargs): return np.asarray(arr, **kwargs).view(dtype=arr.dtype, type=cls) # define new methods here ... def print_shape(self): print self.shape # replace np.array() def array(*args, **kwargs): return MyArray(np.array(*args, **kwargs)) # replace np.asarray() def asarray(*args, **kwargs): return MyArray(*args, **kwargs) # expose array method as function def ps(a): asarray(a).print_shape() Would that work? PS: I found a little error in [1]: In section "__new__ and __init__", the class def should read class C(object): def __new__(cls, *args): + print 'cls is:", cls print 'Args in __new__:', args return object.__new__(cls, *args) def __init__(self, *args): + print 'self is:", self print 'Args in __init__:', args [1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html best, steve From faltet at pytables.org Fri Dec 5 12:42:00 2008 From: faltet at pytables.org (Francesc Alted) Date: Fri, 5 Dec 2008 18:42:00 +0100 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <493891F4.4080001@visualreservoir.com> References: <49386522.70401@visualreservoir.com> <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> <493891F4.4080001@visualreservoir.com> Message-ID: <200812051842.00471.faltet@pytables.org> A Friday 05 December 2008, Brennan Williams escrigu?: > Robert Kern wrote: > > On Thu, Dec 4, 2008 at 18:54, Brennan Williams > > > > wrote: > >> Thanks > >> > >> josef.pktd at gmail.com wrote: > >>> I didn't check what this does behind the scenes, but try this > >> > >> import hashlib #standard python library > >> import numpy as np > >> > >>> m = hashlib.md5() > >>> m.update(np.array(range(100))) > >>> m.update(np.array(range(200))) > > > > I would recommend doing this on the strings before you make arrays > > from them. You don't know if the network cut out in the middle of > > an 8-byte double. > > > > Of course, sending the lengths and other metadata first, then the > > data would let you check without needing to do expensivish hashes > > or checksums. If truncation is your problem rather than corruption, > > then that would be sufficient. You may also consider using the NPY > > format in numpy 1.2 to implement that. > > Thanks for the ideas. I'm definitely going to add some more basic > checks on lengths etc as well. > Unfortunately the problem is happening at a client site so (a) I > can't reproduce it and (b) most of the > time they can't reproduce it either. This is a Windows Python app > running on Citrix reading/writing data > to a Linux networked drive. Another possibility would be to use HDF5 as a data container. It supports the fletcher32 filter [1] which basically computes a chuksum for evey data chunk written to disk and then always check that the data read satifies the checksum kept on-disk. So, if the HDF5 layer doesn't complain, you are basically safe. There are at least two usable HDF5 interfaces for Python and NumPy: PyTables[2] and h5py [3]. PyTables does have support for that right out-of-the-box. Not sure about h5py though (a quick search in docs doesn't reveal nothing). [1] http://rfc.sunsite.dk/rfc/rfc1071.html [2] http://www.pytables.org [3] http://h5py.alfven.org Hope it helps, -- Francesc Alted From h5py at alfven.org Fri Dec 5 15:28:43 2008 From: h5py at alfven.org (Andrew Collette) Date: Fri, 05 Dec 2008 12:28:43 -0800 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <200812051842.00471.faltet@pytables.org> References: <49386522.70401@visualreservoir.com> <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> <493891F4.4080001@visualreservoir.com> <200812051842.00471.faltet@pytables.org> Message-ID: <1228508923.7424.11.camel@tachyon-laptop> > Another possibility would be to use HDF5 as a data container. It > supports the fletcher32 filter [1] which basically computes a chuksum > for evey data chunk written to disk and then always check that the data > read satifies the checksum kept on-disk. So, if the HDF5 layer doesn't > complain, you are basically safe. > > There are at least two usable HDF5 interfaces for Python and NumPy: > PyTables[2] and h5py [3]. PyTables does have support for that right > out-of-the-box. Not sure about h5py though (a quick search in docs > doesn't reveal nothing). > > [1] http://rfc.sunsite.dk/rfc/rfc1071.html > [2] http://www.pytables.org > [3] http://h5py.alfven.org > > Hope it helps, > Just to confirm that h5py does in fact have fletcher32; it's one of the options you can specify when creating a dataset, although it could use better documentation: http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create_dataset Like other checksums, fletcher32 provides error-detection but not error-correction. You'll still need to throw away data which can't be read. However, I believe that you can still read sections of the dataset which aren't corrupted. Andrew Collette From pgmdevlist at gmail.com Fri Dec 5 18:59:25 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 5 Dec 2008 18:59:25 -0500 Subject: [Numpy-discussion] genloadtxt : last call Message-ID: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> All, Here's the latest version of genloadtxt, with some recent corrections. With just a couple of tweaking, we end up with some decent speed: it's still slower than np.loadtxt, but only 15% so according to the test at the end of the package. And so, now what ? Should I put the module in numpy.lib.io ? Elsewhere ? Thx for any comment and suggestions. -------------- next part -------------- A non-text attachment was scrubbed... Name: _preview.py Type: text/x-python-script Size: 32751 bytes Desc: not available URL: -------------- next part -------------- From dsdale24 at gmail.com Sat Dec 6 00:17:58 2008 From: dsdale24 at gmail.com (Darren Dale) Date: Sat, 6 Dec 2008 00:17:58 -0500 Subject: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing In-Reply-To: References: <492D8FD3.8050601@enthought.com> <492DC9B0.1030300@gmail.com> <5b8d13220811301944k7807d3a2w4fcc821255269053@mail.gmail.com> <20081201081220.GC18450@phare.normalesup.org> Message-ID: On Mon, Dec 1, 2008 at 10:30 AM, Darren Dale wrote: > > > On Mon, Dec 1, 2008 at 3:12 AM, Gael Varoquaux < > gael.varoquaux at normalesup.org> wrote: > >> On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote: >> > On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale wrote: >> > > I tried installing 4.0.300x on a machine running 64-bit windows vista >> home >> > > edition and ran into problems with PyQt and some related packages. So >> I >> > > uninstalled all the python-related software, EPD took over 30 minutes >> to >> > > uninstall, and tried to install EPD 4.1 beta. >> >> > My guess is that EPD is only 32 bits installer, so that you run it on >> > WOW (Windows in Windows) on windows 64, which is kind of slow (but >> > usable for most tasks). >> >> On top of that, Vista is not supported with EPD. I had a chat with the >> EPD guys about that, and they say it does work with Vista... most of the >> time. They don't really understand the failures, and haven't had time to >> investigate much, because so far professionals and labs are simply >> avoiding Vista. Hopefully someone from the EPD team will give a more >> accurate answer >> soon. > > > Thanks Gael and David. I would avoid windows altogether if I could. When I > bought a new laptop I had the option to pay extra to downgrade to XP pro, I > should have done some more research before I settled for Vista. In the > meantime I'll borrow an XP machine when I need to build python package > installers for windows. > > Hopefully a solution can be found at some point for python and Vista. > Losing compatibility on such a major platform will become increasingly > problematic. > I just wanted to follow up, it looks like the Vista installation issues have been ironed out with the release of python-2.6.1. I was able to install 32-bit python-2.6.1 from the msi file distributed at python.org in a straight-forward manner, no need to mess around with user account controls or other such nonsense. I even have setuptools working with python 2.6, I built and installed a setuptools msi without much trouble (distutils just doesnt like setuptools version numbering). One pleasant surprise: python-2.6 is built with visual C++ 2008, which has a free express edition available so building python extension modules might be a little more convenient than it was in the past. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Sat Dec 6 05:41:16 2008 From: faltet at pytables.org (Francesc Alted) Date: Sat, 6 Dec 2008 11:41:16 +0100 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <1228508923.7424.11.camel@tachyon-laptop> References: <49386522.70401@visualreservoir.com> <200812051842.00471.faltet@pytables.org> <1228508923.7424.11.camel@tachyon-laptop> Message-ID: <200812061141.16983.faltet@pytables.org> A Friday 05 December 2008, Andrew Collette escrigu?: > > Another possibility would be to use HDF5 as a data container. It > > supports the fletcher32 filter [1] which basically computes a > > chuksum for evey data chunk written to disk and then always check > > that the data read satifies the checksum kept on-disk. So, if the > > HDF5 layer doesn't complain, you are basically safe. > > > > There are at least two usable HDF5 interfaces for Python and NumPy: > > PyTables[2] and h5py [3]. PyTables does have support for that > > right out-of-the-box. Not sure about h5py though (a quick search > > in docs doesn't reveal nothing). > > > > [1] http://rfc.sunsite.dk/rfc/rfc1071.html > > [2] http://www.pytables.org > > [3] http://h5py.alfven.org > > > > Hope it helps, > > Just to confirm that h5py does in fact have fletcher32; it's one of > the options you can specify when creating a dataset, although it > could use better documentation: > > http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create >_dataset My bad. I've searched for 'fletcher' instead of 'fletcher32'. I naively thought that the search tool in Sphinx allowed for partial name finding. In fact, it is a pity it does not. Cheers, -- Francesc Alted From gael.varoquaux at normalesup.org Sat Dec 6 06:35:16 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 6 Dec 2008 12:35:16 +0100 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> Message-ID: <20081206113516.GC12839@phare.normalesup.org> On Fri, Dec 05, 2008 at 06:59:25PM -0500, Pierre GM wrote: > Here's the latest version of genloadtxt, with some recent corrections. With > just a couple of tweaking, we end up with some decent speed: it's still > slower than np.loadtxt, but only 15% so according to the test at the end of > the package. 15% slow-down is acceptable, IMHO. There is fromfile for the fast and well understood usecase. Thanks for doing all this work. Ga?l From brennan.williams at visualreservoir.com Sat Dec 6 20:15:25 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Sun, 07 Dec 2008 14:15:25 +1300 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <200812061141.16983.faltet@pytables.org> References: <49386522.70401@visualreservoir.com> <200812051842.00471.faltet@pytables.org> <1228508923.7424.11.camel@tachyon-laptop> <200812061141.16983.faltet@pytables.org> Message-ID: <493B23AD.6000300@visualreservoir.com> OK so maybe I should.... (1) not add some sort of checksum type functionality to my read/write methods these read/write methods simply read/write numpy arrays to a binary file which contains one or more numpy arrays (and nothing else). (2) replace my binary files iwith either HDF5 or PyTables But.... my app is being used by clients on existing projects - in one case there are over 900 of these numpy binary files in just one project, albeit each file is pretty small (200KB or so) so.. questions..... How can I tranparently (or at least with minimum user-pain) replace my existing read/write methods with PyTables or HDF5? My initial thoughts are... (a) have an app version number and a data format version number which i can check against. (b) if data format version < 1.0 then read from old binary files (c) if app version number > 1.0 then write to new PyTables or HDF5 files (d) get clients to open existing project and then save existing project to semi-transparently convert from old to new formats. Francesc Alted wrote: > A Friday 05 December 2008, Andrew Collette escrigu?: > >>> Another possibility would be to use HDF5 as a data container. It >>> supports the fletcher32 filter [1] which basically computes a >>> chuksum for evey data chunk written to disk and then always check >>> that the data read satifies the checksum kept on-disk. So, if the >>> HDF5 layer doesn't complain, you are basically safe. >>> >>> There are at least two usable HDF5 interfaces for Python and NumPy: >>> PyTables[2] and h5py [3]. PyTables does have support for that >>> right out-of-the-box. Not sure about h5py though (a quick search >>> in docs doesn't reveal nothing). >>> >>> [1] http://rfc.sunsite.dk/rfc/rfc1071.html >>> [2] http://www.pytables.org >>> [3] http://h5py.alfven.org >>> >>> Hope it helps, >>> >> Just to confirm that h5py does in fact have fletcher32; it's one of >> the options you can specify when creating a dataset, although it >> could use better documentation: >> >> http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create >> _dataset >> > > My bad. I've searched for 'fletcher' instead of 'fletcher32'. I > naively thought that the search tool in Sphinx allowed for partial name > finding. In fact, it is a pity it does not. > > Cheers, > > From pgmdevlist at gmail.com Sun Dec 7 15:02:53 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 7 Dec 2008 15:02:53 -0500 Subject: [Numpy-discussion] Python2.4 support Message-ID: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> All, * What versions of Python should be supported by what version of numpy ? Are we to expect users to rely on Python2.5 for the upcoming 1.3.x ? Could we have some kind of timeline on the trac site or elsewhere (and if such a timeline exists already, can I get the link?) ? * Talking about 1.3.x, what's the timeline? Are we still shooting for a release in 2008 or could we wait till mid Jan. 2009 ? Thx a lot in advance From millman at berkeley.edu Sun Dec 7 16:21:53 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Sun, 7 Dec 2008 13:21:53 -0800 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> Message-ID: On Sun, Dec 7, 2008 at 12:02 PM, Pierre GM wrote: > * What versions of Python should be supported by what version of > numpy ? Are we to expect users to rely on Python2.5 for the upcoming > 1.3.x ? Could we have some kind of timeline on the trac site or > elsewhere (and if such a timeline exists already, can I get the link?) ? NumPy 1.3.x should work with Python 2.4, 2.5, and 2.6. At some point we can drop 2.4, but I would like to wait a bit since we just dropped 2.3 support. The timeline is on the trac site: http://projects.scipy.org/scipy/numpy/milestone/1.3.0 > * Talking about 1.3.x, what's the timeline? Are we still shooting for > a release in 2008 or could we wait till mid Jan. 2009 ? I am fine with pushing the release back, if there is interest in doing that. I have been mainly focusing on getting SciPy 0.7.x out, so I haven't been following the NumPy development closely. But it is good that you are asking for more concrete details about the next NumPy release. We need to start making plans. Does anyone have any suggestions about whether we should push the release back? Is 1 month long enough? What is left to do? Please feel free to update the release notes, which are checked into the trunk: http://scipy.org/scipy/numpy/browser/trunk/doc/release/1.3.0-notes.rst Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From pgmdevlist at gmail.com Sun Dec 7 16:34:31 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 7 Dec 2008 16:34:31 -0500 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> Message-ID: <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> On Dec 7, 2008, at 4:21 PM, Jarrod Millman wrote: > NumPy 1.3.x should work with Python 2.4, 2.5, and 2.6. At some point > we can drop 2.4, but I would like to wait a bit since we just dropped > 2.3 support. The timeline is on the trac site: > http://projects.scipy.org/scipy/numpy/milestone/1.3.0 OK, great, thanks a lot. >> * Talking about 1.3.x, what's the timeline? Are we still shooting for >> a release in 2008 or could we wait till mid Jan. 2009 ? > > I am fine with pushing the release back, if there is interest in doing > that. I have been mainly focusing on getting SciPy 0.7.x out, so I > haven't been following the NumPy development closely. But it is good > that you are asking for more concrete details about the next NumPy > release. We need to start making plans. Does anyone have any > suggestions about whether we should push the release back? Is 1 month > long enough? What is left to do? Well, on my side, there's some doc to be updated, of course. Then, I'd like to put the rec_functions that were developed in matplotlib to manipulate recordarrays. I haven't started yet, might be able to do so before the end of the year (not much to do, just a clean up and some examples). And what should we do with the genloadtxt function ? > > Please feel free to update the release notes, which are checked into > the trunk: > http://scipy.org/scipy/numpy/browser/trunk/doc/release/1.3.0- > notes.rst > Will do in good time. Thx again From david at ar.media.kyoto-u.ac.jp Mon Dec 8 00:42:53 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 08 Dec 2008 14:42:53 +0900 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> Message-ID: <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Jarrod Millman wrote: > On Sun, Dec 7, 2008 at 12:02 PM, Pierre GM wrote: > >> * What versions of Python should be supported by what version of >> numpy ? Are we to expect users to rely on Python2.5 for the upcoming >> 1.3.x ? Could we have some kind of timeline on the trac site or >> elsewhere (and if such a timeline exists already, can I get the link?) ? >> > > NumPy 1.3.x should work with Python 2.4, 2.5, and 2.6. At some point > we can drop 2.4, but I would like to wait a bit since we just dropped > 2.3 support. The timeline is on the trac site: > http://projects.scipy.org/scipy/numpy/milestone/1.3.0 > I am strongly against dropping 2.4 support anytime soon. I haven't seen a strong rationale for using >= 2.5 features in numpy, supporting 2.4 is not so hard, and 2.4 is still the default python version on many OS (mac os X 10.4 I believe, RHEL for sure, open solaris). David From millman at berkeley.edu Mon Dec 8 01:49:54 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Sun, 7 Dec 2008 22:49:54 -0800 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Message-ID: On Sun, Dec 7, 2008 at 9:42 PM, David Cournapeau wrote: > I am strongly against dropping 2.4 support anytime soon. I haven't seen > a strong rationale for using >= 2.5 features in numpy, supporting 2.4 is > not so hard, and 2.4 is still the default python version on many OS (mac > os X 10.4 I believe, RHEL for sure, open solaris). While my feelings aren't as strong as David's, they are pretty much identical. As a point of reference, Red Hat Enterprise Linux 6 won't come out until at least the first quarter of 2010. Until then we should make a serious effort to support Python 2.4, which ships with RHEL 5. It looks like RHEL 6 will be based on the upcoming Fedora 11 release, which will ship with Python 2.6. That gives us a minimum of one year for 2.4 support. Once RHEL 6 is released, it will take several months before a sizable number of users upgrade. Moin has a detailed list of Python versions for various OSes and hosting services: http://moinmo.in/PollAboutRequiringPython24 -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From james at fmnmedia.co.uk Mon Dec 8 06:35:33 2008 From: james at fmnmedia.co.uk (James) Date: Mon, 08 Dec 2008 11:35:33 +0000 Subject: [Numpy-discussion] Line of best fit! Message-ID: <493D0685.5050903@fmnmedia.co.uk> Hi, I am trying to plot a line of best fit for some data i have, is there a simple way of doing it? Cheers From scott.sinclair.za at gmail.com Mon Dec 8 07:47:05 2008 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 8 Dec 2008 14:47:05 +0200 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: <493D0685.5050903@fmnmedia.co.uk> References: <493D0685.5050903@fmnmedia.co.uk> Message-ID: <6a17e9ee0812080447n7dc9495jc51fccda1ed8828a@mail.gmail.com> > 2008/12/8 James : > I am trying to plot a line of best fit for some data i have, is there a > simple way of doing it? Hi James, Take a look at: http://www.scipy.org/Cookbook/FittingData http://www.scipy.org/Cookbook/LinearRegression and the section on least square fitting towards the end of this page in the Scipy docs: http://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html Post again if if these references don't get you going. Cheers, Scott From ezindy at gmail.com Mon Dec 8 07:55:02 2008 From: ezindy at gmail.com (Egor Zindy) Date: Mon, 08 Dec 2008 12:55:02 +0000 Subject: [Numpy-discussion] ANN: numpy.i - added managed deallocation to ARGOUTVIEW_ARRAY1 (ARGOUTVIEWM_ARRAY1) In-Reply-To: <492538B9.10202@gmail.com> References: <491F8F4A.30009@gmail.com> <49231156.1060209@gmail.com> <49231EB3.8060802@noaa.gov> <492538B9.10202@gmail.com> Message-ID: <493D1926.4010506@gmail.com> Hello list, just a quick follow-up on the managed deallocation. This is what I've done this week-end: In numpy.i, I have redefined the import_array() function to also take care of the managed memory initialisation (the _MyDeallocType.tp_new = PyType_GenericNew; statement). This means that in %init(), the only call is to import_array(). Basically, the same as with the "normal" numpy.i. Only difference in a swig file (.i) between "unmanaged" and "managed" memory allocation is the use of either the ARGOUTVIEW_ARRAY or ARGOUTVIEWM_ARRAY fragments. Everything else is hidden. In numpy.i, this is what's now happening (my previous attempts were a bit clumsy): %#undef import_array %#define import_array() {if (_import_array() < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); return; }; _MyDeallocType.tp_new = PyType_GenericNew; if (PyType_Ready(&_MyDeallocType) < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "Custom memory management failed to initialize (numpy.i)"); return; } } %#undef import_array1 %#define import_array1(ret) {if (_import_array() < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); return ret; }; _MyDeallocType.tp_new = PyType_GenericNew; if (PyType_Ready(&_MyDeallocType) < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "Custom memory management failed to initialize (numpy.i)"); return ret; } } %#undef import_array2 %#define import_array2(msg, ret) {if (_import_array() < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, msg); return ret; }; _MyDeallocType.tp_new = PyType_GenericNew; if (PyType_Ready(&_MyDeallocType) < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, msg); return ret; } } My wiki (sorry, haven't moved it to the scipy cookbook yet) has all the details (the modified numpy.i, explanations, and some test code): http://code.google.com/p/ezwidgets/wiki/NumpyManagedMemory Regards, Egor From ramercer at gmail.com Mon Dec 8 09:33:21 2008 From: ramercer at gmail.com (Adam Mercer) Date: Mon, 8 Dec 2008 08:33:21 -0600 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Message-ID: <799406d60812080633h55e16d4fl2a084721b96314bc@mail.gmail.com> On Sun, Dec 7, 2008 at 23:42, David Cournapeau wrote: > I am strongly against dropping 2.4 support anytime soon. I haven't seen > a strong rationale for using >= 2.5 features in numpy, supporting 2.4 is > not so hard, and 2.4 is still the default python version on many OS (mac > os X 10.4 I believe, RHEL for sure, open solaris). Mac OS X 10.4 uses python-2.3, 10.5 uses python-2.5. Cheers Adam From matthieu.brucher at gmail.com Mon Dec 8 09:40:03 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 8 Dec 2008 15:40:03 +0100 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Message-ID: > While my feelings aren't as strong as David's, they are pretty much identical. > > As a point of reference, Red Hat Enterprise Linux 6 won't come out > until at least the first quarter of 2010. Until then we should make a > serious effort to support Python 2.4, which ships with RHEL 5. It > looks like RHEL 6 will be based on the upcoming Fedora 11 release, > which will ship with Python 2.6. That gives us a minimum of one year > for 2.4 support. Once RHEL 6 is released, it will take several months > before a sizable number of users upgrade. > > Moin has a detailed list of Python versions for various OSes and > hosting services: > http://moinmo.in/PollAboutRequiringPython24 At least several months, if not years. RedHat supports each version 7 years, for instance (I don't ask for that long). Currently, I'm still using a RHEL 4, although it is planned to migrate to RHEL 5 next year. So we should still support 2.4 for at least 18 months, in case some big firms use RHEL and Python+Numpy for their tools. -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From gwg at emss.co.za Mon Dec 8 10:50:14 2008 From: gwg at emss.co.za (George Goussard) Date: Mon, 8 Dec 2008 17:50:14 +0200 Subject: [Numpy-discussion] Singular Matrix problem with Matplitlib in Numpy (Windows - AMD64) Message-ID: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> Hello. I have been battling with the following error for the past week. The output from the terminal is: Traceback (most recent call last): File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\backends\backend_qt4agg.py", line 86, in paintEvent FigureCanvasAgg.draw(self) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\backends\backend_agg.py", line 261, in draw self.figure.draw(self.renderer) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\figure.py", line 765, in draw legend.draw(renderer) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\legend.py", line 197, in draw self._update_positions(renderer) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\legend.py", line 513, in _update_positions l,b,w,h = get_tbounds(self.texts[-1]) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\legend.py", line 499, in get_tbounds bboxa = bbox.inverse_transformed(self.get_transform()) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\transforms.py", line 478, in inverse_transformed return Bbox(transform.inverted().transform(self.get_points())) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\transforms.py", line 1338, in inverted self._inverted = Affine2D(inv(mtx)) File "C:\development\Python\2_5_2\Lib\site-packages\numpy\linalg\linalg.py", line 350, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) File "C:\development\Python\2_5_2\Lib\site-packages\numpy\linalg\linalg.py", line 249, in solve raise LinAlgError, 'Singular matrix' numpy.linalg.linalg.LinAlgError: Singular matrix Initially MPL plots a graph but when you try to interact with the widget(for example resize) then the output is displayed and the MPL figure is not updated. Everything works with Windows 32-bit. Linux 32-bit and 64-bit are working correctly. Any ideas would be helpful. Thanks. George. -------------- next part -------------- An HTML attachment was scrubbed... URL: From f.yw at hotmail.com Mon Dec 8 12:27:01 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 8 Dec 2008 10:27:01 -0700 Subject: [Numpy-discussion] how to create a matrix based on a vector? In-Reply-To: <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> Message-ID: Hi, I want to create a matrix based on a vector. It is difficult to describe the issue for me in english. Here is an example. Suppose I have an array([3, 6, 8, 12]), I want to create a range based on each element. In this exampe, let us say want to create 4 number with step 2, so I will have [3, 6, 8, 12 5, 8, 10,14 7, 10,12,16 9, 12,14,18] It is a 4 by 4 maxtric in this example. My original array is quite large. but the range I want to create around the number is not big, it is about 30. Does anyone know how to do this efficiently? Thanks Frank _________________________________________________________________ Send e-mail faster without improving your typing skills. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_speed_122008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Dec 8 12:30:31 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 8 Dec 2008 11:30:31 -0600 Subject: [Numpy-discussion] how to create a matrix based on a vector? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> Message-ID: <3d375d730812080930v2abb508fwbaaaea43dac143d7@mail.gmail.com> On Mon, Dec 8, 2008 at 11:27, frank wang wrote: > Hi, > > I want to create a matrix based on a vector. It is difficult to describe the > issue for me in english. Here is an example. > > Suppose I have an array([3, 6, 8, 12]), I want to create a range based on > each element. In this exampe, let us say want to create 4 number with step > 2, so I will have > > [3, 6, 8, 12 > 5, 8, 10,14 > 7, 10,12,16 > 9, 12,14,18] > > It is a 4 by 4 maxtric in this example. My original array is quite large. > but the range I want to create around the number is not big, it is about 30. > > Does anyone know how to do this efficiently? In [1]: from numpy import * In [2]: a = array([3, 6, 8, 12]) In [4]: b = arange(0, 4*2, 2)[:,newaxis] In [5]: a+b Out[5]: array([[ 3, 6, 8, 12], [ 5, 8, 10, 14], [ 7, 10, 12, 16], [ 9, 12, 14, 18]]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Mon Dec 8 12:43:25 2008 From: cournape at gmail.com (David Cournapeau) Date: Tue, 9 Dec 2008 02:43:25 +0900 Subject: [Numpy-discussion] Singular Matrix problem with Matplitlib in Numpy (Windows - AMD64) In-Reply-To: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> References: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> Message-ID: <5b8d13220812080943g69d4c670jabd6aef66d336e29@mail.gmail.com> On Tue, Dec 9, 2008 at 12:50 AM, George Goussard wrote: > Hello. > > > > I have been battling with the following error for the past week. The output > from the terminal is: > What does numpy.test() says ? Did you use an external blas/lapack when you built numpy for AMD64 David From faltet at pytables.org Mon Dec 8 13:01:36 2008 From: faltet at pytables.org (Francesc Alted) Date: Mon, 8 Dec 2008 19:01:36 +0100 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <493B23AD.6000300@visualreservoir.com> References: <49386522.70401@visualreservoir.com> <200812061141.16983.faltet@pytables.org> <493B23AD.6000300@visualreservoir.com> Message-ID: <200812081901.37137.faltet@pytables.org> A Sunday 07 December 2008, Brennan Williams escrigu?: > OK so maybe I should.... > > (1) not add some sort of checksum type functionality to my read/write > methods > > these read/write methods simply read/write numpy arrays to a > binary file which contains one or more numpy arrays (and nothing > else). > > (2) replace my binary files iwith either HDF5 or PyTables > > But.... > > my app is being used by clients on existing projects - in one case > there are over 900 of these numpy binary files in just one project, > albeit each file is pretty small (200KB or so) > > so.. questions..... > > How can I tranparently (or at least with minimum user-pain) replace > my existing read/write methods with PyTables or HDF5? > > My initial thoughts are... > > (a) have an app version number and a data format version number which > i can check against. > > (b) if data format version < 1.0 then read from old binary files > > (c) if app version number > 1.0 then write to new PyTables or HDF5 > files > > (d) get clients to open existing project and then save existing > project to semi-transparently convert from old to new formats. Yeah. That would work perfectly. Also, there is a function in PyTables named 'isHDF5File(filename)' that allow you to know whether a file is in HDF5 format or not. You might want to use it and avoid to bother with data format/app version issues. Cheers, Francesc > > Francesc Alted wrote: > > A Friday 05 December 2008, Andrew Collette escrigu?: > >>> Another possibility would be to use HDF5 as a data container. It > >>> supports the fletcher32 filter [1] which basically computes a > >>> chuksum for evey data chunk written to disk and then always check > >>> that the data read satifies the checksum kept on-disk. So, if > >>> the HDF5 layer doesn't complain, you are basically safe. > >>> > >>> There are at least two usable HDF5 interfaces for Python and > >>> NumPy: PyTables[2] and h5py [3]. PyTables does have support for > >>> that right out-of-the-box. Not sure about h5py though (a quick > >>> search in docs doesn't reveal nothing). > >>> > >>> [1] http://rfc.sunsite.dk/rfc/rfc1071.html > >>> [2] http://www.pytables.org > >>> [3] http://h5py.alfven.org > >>> > >>> Hope it helps, > >> > >> Just to confirm that h5py does in fact have fletcher32; it's one > >> of the options you can specify when creating a dataset, although > >> it could use better documentation: > >> > >> http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.cre > >>ate _dataset > > > > My bad. I've searched for 'fletcher' instead of 'fletcher32'. I > > naively thought that the search tool in Sphinx allowed for partial > > name finding. In fact, it is a pity it does not. > > > > Cheers, > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted From f.yw at hotmail.com Mon Dec 8 13:40:01 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 8 Dec 2008 11:40:01 -0700 Subject: [Numpy-discussion] how to create a matrix based on a vector? In-Reply-To: <3d375d730812080930v2abb508fwbaaaea43dac143d7@mail.gmail.com> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> <3d375d730812080930v2abb508fwbaaaea43dac143d7@mail.gmail.com> Message-ID: I got a lof of help from the experts in this forum. I resitsted to send a thank you reply for fearing spaming the forum. This time I really want to let the people know that I am really appreciate the great help I got. Please let me know if a simple thank you message is not appropriate in this forum. Numpy makes Pyhton a great tools for processing signal. Thank you very much. Frank > Date: Mon, 8 Dec 2008 11:30:31 -0600> From: robert.kern at gmail.com> To: numpy-discussion at scipy.org> Subject: Re: [Numpy-discussion] how to create a matrix based on a vector?> > On Mon, Dec 8, 2008 at 11:27, frank wang wrote:> > Hi,> >> > I want to create a matrix based on a vector. It is difficult to describe the> > issue for me in english. Here is an example.> >> > Suppose I have an array([3, 6, 8, 12]), I want to create a range based on> > each element. In this exampe, let us say want to create 4 number with step> > 2, so I will have> >> > [3, 6, 8, 12> > 5, 8, 10,14> > 7, 10,12,16> > 9, 12,14,18]> >> > It is a 4 by 4 maxtric in this example. My original array is quite large.> > but the range I want to create around the number is not big, it is about 30.> >> > Does anyone know how to do this efficiently?> > In [1]: from numpy import *> > In [2]: a = array([3, 6, 8, 12])> > In [4]: b = arange(0, 4*2, 2)[:,newaxis]> > In [5]: a+b> Out[5]:> array([[ 3, 6, 8, 12],> [ 5, 8, 10, 14],> [ 7, 10, 12, 16],> [ 9, 12, 14, 18]])> > -- > Robert Kern> > "I have come to believe that the whole world is an enigma, a harmless> enigma that is made terrible by our own mad attempt to interpret it as> though it had an underlying truth."> -- Umberto Eco> _______________________________________________> Numpy-discussion mailing list> Numpy-discussion at scipy.org> http://projects.scipy.org/mailman/listinfo/numpy-discussion _________________________________________________________________ Send e-mail faster without improving your typing skills. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_speed_122008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Dec 8 14:37:24 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 8 Dec 2008 13:37:24 -0600 Subject: [Numpy-discussion] how to create a matrix based on a vector? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> <3d375d730812080930v2abb508fwbaaaea43dac143d7@mail.gmail.com> Message-ID: <3d375d730812081137k1a6b4ca2g1b37a05001101cf5@mail.gmail.com> On Mon, Dec 8, 2008 at 12:40, frank wang wrote: > I got a lof of help from the experts in this forum. I resitsted to send a > thank you reply for fearing spaming the forum. This time I really want to > let the people know that I am really appreciate the great help I got. > > Please let me know if a simple thank you message is not appropriate in this > forum. Thanks, public or otherwise, are always appreciated. You're quite welcome. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From james at fmnmedia.co.uk Mon Dec 8 15:32:48 2008 From: james at fmnmedia.co.uk (James) Date: Mon, 08 Dec 2008 20:32:48 +0000 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: <493D0685.5050903@fmnmedia.co.uk> References: <493D0685.5050903@fmnmedia.co.uk> Message-ID: <493D8470.8000303@fmnmedia.co.uk> I have a very simple plot, and the lines join point to point, however i would like to add a line of best fit now onto the chart, i am really new to python etc, and didnt really understand those links! Can anyone help me :) Cheers! James wrote: > Hi, > > I am trying to plot a line of best fit for some data i have, is there a > simple way of doing it? > > Cheers > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From lou_boog2000 at yahoo.com Mon Dec 8 15:54:18 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 8 Dec 2008 12:54:18 -0800 (PST) Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <493D8470.8000303@fmnmedia.co.uk> Message-ID: <966547.42601.qm@web34402.mail.mud.yahoo.com> In looking for simple ways to read and write data (in a text readable format) to and from a file and later restoring the actual data when reading back in, I've found that numpy arrays don't seem to play well with repr and eval. E.g. to write some data (mixed types) to a file I can do this (fp is an open file), thedata=[3.0,-4.9+2.0j,'another string'] repvars= repr(thedata)+"\n" fp.write(repvars) Then to read it back and restore the data each to its original type, strvars= fp.readline() sonofdata= eval(strvars) which gives back the original data list. BUT when I try this with numpy arrays in the data list I find that repr of an array adds extra end-of-lines and that messes up the simple restoration of the data using eval. Am I missing something simple? I know I've seen people recommend ways to save arrays to files, but I'm wondering what is the most straight-forward? I really like the simple, pythonic approach of the repr - eval pairing. Thanks for any advice. (yes, I am googling, too) -- Lou Pecora, my views are my own. From matthieu.brucher at gmail.com Mon Dec 8 15:56:40 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 8 Dec 2008 21:56:40 +0100 Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <966547.42601.qm@web34402.mail.mud.yahoo.com> References: <493D8470.8000303@fmnmedia.co.uk> <966547.42601.qm@web34402.mail.mud.yahoo.com> Message-ID: Hi, The repr - eval pair does not work with numpy. You can simply do a tofile() from file(). Matthieu 2008/12/8 Lou Pecora : > In looking for simple ways to read and write data (in a text readable format) to and from a file and later restoring the actual data when reading back in, I've found that numpy arrays don't seem to play well with repr and eval. > > E.g. to write some data (mixed types) to a file I can do this (fp is an open file), > > thedata=[3.0,-4.9+2.0j,'another string'] > repvars= repr(thedata)+"\n" > fp.write(repvars) > > Then to read it back and restore the data each to its original type, > > strvars= fp.readline() > sonofdata= eval(strvars) > > which gives back the original data list. > > BUT when I try this with numpy arrays in the data list I find that repr of an array adds extra end-of-lines and that messes up the simple restoration of the data using eval. > > Am I missing something simple? I know I've seen people recommend ways to save arrays to files, but I'm wondering what is the most straight-forward? I really like the simple, pythonic approach of the repr - eval pairing. > > Thanks for any advice. (yes, I am googling, too) > > > -- Lou Pecora, my views are my own. > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From robert.kern at gmail.com Mon Dec 8 16:15:41 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 8 Dec 2008 15:15:41 -0600 Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <966547.42601.qm@web34402.mail.mud.yahoo.com> References: <493D8470.8000303@fmnmedia.co.uk> <966547.42601.qm@web34402.mail.mud.yahoo.com> Message-ID: <3d375d730812081315g26e14706s8ca2faa94cf75f58@mail.gmail.com> On Mon, Dec 8, 2008 at 14:54, Lou Pecora wrote: > In looking for simple ways to read and write data (in a text readable format) to and from a file and later restoring the actual data when reading back in, I've found that numpy arrays don't seem to play well with repr and eval. > > E.g. to write some data (mixed types) to a file I can do this (fp is an open file), > > thedata=[3.0,-4.9+2.0j,'another string'] > repvars= repr(thedata)+"\n" > fp.write(repvars) > > Then to read it back and restore the data each to its original type, > > strvars= fp.readline() > sonofdata= eval(strvars) > > which gives back the original data list. > > BUT when I try this with numpy arrays in the data list I find that repr of an array adds extra end-of-lines and that messes up the simple restoration of the data using eval. I don't see any extra end-of-lines. Are you sure you aren't talking about the "..." when you are saving large arrays? You will need to use set_printoptions() to disable that (threshold=sys.maxint). You should also adjust use precision=18, suppress=False. That should mostly work, but it's never a certain thing. > Am I missing something simple? I know I've seen people recommend ways to save arrays to files, but I'm wondering what is the most straight-forward? I really like the simple, pythonic approach of the repr - eval pairing. > > Thanks for any advice. (yes, I am googling, too) The most bulletproof way would be to use numpy.save() and numpy.load(), but this is a binary format, not a text one. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lou_boog2000 at yahoo.com Mon Dec 8 16:24:27 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 8 Dec 2008 13:24:27 -0800 (PST) Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: Message-ID: <775651.80835.qm@web34401.mail.mud.yahoo.com> --- On Mon, 12/8/08, Matthieu Brucher wrote: > From: Matthieu Brucher > Subject: Re: [Numpy-discussion] What to use to read and write numpy arrays to a file? > To: "Discussion of Numerical Python" > Date: Monday, December 8, 2008, 3:56 PM > Hi, > > The repr - eval pair does not work with numpy. You can > simply do a > tofile() from file(). > > Matthieu Yes, I found the tofile/fromfile pair, but they don't preserve the shape. Sorry, I should have been clearer on that in my request. I will be saving arrays whose shape I may not know later when I read them in. I'd like that information to be preserved. Thanks. -- Lou Pecora, my views are my own. From lou_boog2000 at yahoo.com Mon Dec 8 16:26:20 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 8 Dec 2008 13:26:20 -0800 (PST) Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <3d375d730812081315g26e14706s8ca2faa94cf75f58@mail.gmail.com> Message-ID: <37053.3395.qm@web34406.mail.mud.yahoo.com> --- On Mon, 12/8/08, Robert Kern wrote: > From: Robert Kern > Subject: Re: [Numpy-discussion] What to use to read and write numpy arrays to a file? > > The most bulletproof way would be to use numpy.save() and > numpy.load(), but this is a binary format, not a text one. > > -- > Robert Kern > Thanks, Robert. I may have to go that route, assuming that the save and load pair preserve shape, i.e. I don't have to know the shape when I read back in. -- Lou Pecora, my views are my own. From robert.kern at gmail.com Mon Dec 8 16:28:14 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 8 Dec 2008 15:28:14 -0600 Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <37053.3395.qm@web34406.mail.mud.yahoo.com> References: <3d375d730812081315g26e14706s8ca2faa94cf75f58@mail.gmail.com> <37053.3395.qm@web34406.mail.mud.yahoo.com> Message-ID: <3d375d730812081328i27e624f9gd181efbd5625b3c0@mail.gmail.com> On Mon, Dec 8, 2008 at 15:26, Lou Pecora wrote: > --- On Mon, 12/8/08, Robert Kern wrote: > >> From: Robert Kern >> Subject: Re: [Numpy-discussion] What to use to read and write numpy arrays to a file? >> >> The most bulletproof way would be to use numpy.save() and >> numpy.load(), but this is a binary format, not a text one. >> >> -- >> Robert Kern >> > > Thanks, Robert. I may have to go that route, assuming that the save and load pair preserve shape, i.e. I don't have to know the shape when I read back in. They do. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From amcmorl at gmail.com Mon Dec 8 19:00:16 2008 From: amcmorl at gmail.com (Angus McMorland) Date: Mon, 8 Dec 2008 19:00:16 -0500 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: <493D8470.8000303@fmnmedia.co.uk> References: <493D0685.5050903@fmnmedia.co.uk> <493D8470.8000303@fmnmedia.co.uk> Message-ID: Hi James, 2008/12/8 James : > > I have a very simple plot, and the lines join point to point, however i > would like to add a line of best fit now onto the chart, i am really new > to python etc, and didnt really understand those links! > > Can anyone help me :) It sounds like the second link, about linear regression, is a good place to start, and I've made a very simple example based on that: ----------------------------------------------- import numpy as np import matplotlib.pyplot as plt x = np.linspace(0, 10, 11) #1 data_y = np.random.normal(size=x.shape, loc=x, scale=2.5) #2 plt.plot(x, data_y, 'bo') #3 coefs = np.lib.polyfit(x, data_y, 1) #4 fit_y = np.lib.polyval(coefs, x) #5 plt.plot(x, fit_y, 'b--') #6 ------------------------------------------------ Line 1 creates an array with the x values I have. Line 2 creates some random "data" I want to fit, which, in this case happens to be normally distributed around the unity line y=x. The raw data is plotted (assuming you have matplotlib installed as well - I suggest you do) by line 3, with blue circles. Line 4 calculates the coefficients giving the least-squares best fit to a first degree polynomial (i.e. a straight line y = c0 * x + c1). So the values of coefs are c0 and c1 in the previous equation. Line 5 calculates the y values on the fitted polynomial, at given x values, from the coefficients calculated in line 4, and line 6 simply plots these fitted y values, using a dotted blue line. I hope that helps get you started. Keep posting questions on specific issues as they arise, and we'll see what we can do to help. Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh From steve at shrogers.com Mon Dec 8 19:07:00 2008 From: steve at shrogers.com (Steven H. Rogers) Date: Mon, 08 Dec 2008 17:07:00 -0700 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Message-ID: <493DB6A4.1050501@shrogers.com> Matthieu Brucher wrote: > At least several months, if not years. RedHat supports each version 7 > years, for instance (I don't ask for that long). > Currently, I'm still using a RHEL 4, although it is planned to migrate > to RHEL 5 next year. So we should still support 2.4 for at least 18 > months, in case some big firms use RHEL and Python+Numpy for their > tools. > +1 From f.yw at hotmail.com Mon Dec 8 20:15:26 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 8 Dec 2008 18:15:26 -0700 Subject: [Numpy-discussion] how do I delete unused matrix to save the memory? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> Message-ID: Hi, I have a program with some variables consume a lot of memory. The first time I run it, it is fine. The second time I run it, I will get MemoryError. If I close the ipython and reopen it again, then I can run the program once. I am looking for a command to delete the intermediate variable once it is not used to save memory like in matlab clear command. Thanks Frank _________________________________________________________________ Send e-mail faster without improving your typing skills. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_speed_122008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at enthought.com Mon Dec 8 22:00:57 2008 From: travis at enthought.com (Travis Vaught) Date: Mon, 8 Dec 2008 21:00:57 -0600 Subject: [Numpy-discussion] how do I delete unused matrix to save the memory? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com>