From gael.varoquaux at normalesup.org Mon Dec 1 03:12:20 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 1 Dec 2008 09:12:20 +0100 Subject: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing In-Reply-To: <5b8d13220811301944k7807d3a2w4fcc821255269053@mail.gmail.com> References: <492D8FD3.8050601@enthought.com> <492DC9B0.1030300@gmail.com> <5b8d13220811301944k7807d3a2w4fcc821255269053@mail.gmail.com> Message-ID: <20081201081220.GC18450@phare.normalesup.org> On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote: > On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale wrote: > > I tried installing 4.0.300x on a machine running 64-bit windows vista home > > edition and ran into problems with PyQt and some related packages. So I > > uninstalled all the python-related software, EPD took over 30 minutes to > > uninstall, and tried to install EPD 4.1 beta. > My guess is that EPD is only 32 bits installer, so that you run it on > WOW (Windows in Windows) on windows 64, which is kind of slow (but > usable for most tasks). On top of that, Vista is not supported with EPD. I had a chat with the EPD guys about that, and they say it does work with Vista... most of the time. They don't really understand the failures, and haven't had time to investigate much, because so far professionals and labs are simply avoiding Vista. Hopefully someone from the EPD team will give a more accurate answer soon. Ga?l From timmichelsen at gmx-topmail.de Mon Dec 1 05:22:10 2008 From: timmichelsen at gmx-topmail.de (Timmie) Date: Mon, 1 Dec 2008 10:22:10 +0000 (UTC) Subject: [Numpy-discussion] optimising single value functions for array calculations Message-ID: Hello, I am developing a module which bases its calculations on another specialised module. My module uses numpy arrays a lot. The problem is that the other module I am building upon, does not work with (whole) arrays but with single values. Therefore, I am currently forces to loop over the array: ### a = numpy.arange(100) b = numpy.arange(100,200) for i in range(0,a.size): a[i] = myfunc(a[i])* b[i] ### The results come out well. But the problem is that this way of calculation is very ineffiecent and takes time. May anyone give me a hint on how I can improve my code without having to modify the package I am building upon. I do not want to change it a lot because I would always have to run behind the chnages in the other package. To summarise: How to I make a calculation function array-aware? Thanks in advance, Timmie From emmanuelle.gouillart at normalesup.org Mon Dec 1 05:28:46 2008 From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart) Date: Mon, 1 Dec 2008 11:28:46 +0100 (CET) Subject: [Numpy-discussion] optimising single value functions for array calculations In-Reply-To: References: Message-ID: <12998.195.68.31.231.1228127326.squirrel@www.normalesup.org> Hello Timmie, numpy.vectorize(myfunc) should do what you want. Cheers, Emmanuelle > Hello, > I am developing a module which bases its calculations > on another specialised module. > My module uses numpy arrays a lot. > The problem is that the other module I am building > upon, does not work with (whole) arrays but with > single values. > Therefore, I am currently forces to loop over the > array: > > ### > a = numpy.arange(100) > b = numpy.arange(100,200) > for i in range(0,a.size): > a[i] = myfunc(a[i])* b[i] > > ### > > The results come out well. But the problem is that this > way of calculation is very ineffiecent and takes time. > > May anyone give me a hint on how I can improve my > code without having to modify the package I am > building upon. I do not want to change it a lot because > I would always have to run behind the chnages in the > other package. > > To summarise: > How to I make a calculation function array-aware? > > Thanks in advance, > Timmie > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From matthieu.brucher at gmail.com Mon Dec 1 05:33:56 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 1 Dec 2008 11:33:56 +0100 Subject: [Numpy-discussion] optimising single value functions for array calculations In-Reply-To: References: Message-ID: 2008/12/1 Timmie : > Hello, > I am developing a module which bases its calculations > on another specialised module. > My module uses numpy arrays a lot. > The problem is that the other module I am building > upon, does not work with (whole) arrays but with > single values. > Therefore, I am currently forces to loop over the > array: > > ### > a = numpy.arange(100) > b = numpy.arange(100,200) > for i in range(0,a.size): > a[i] = myfunc(a[i])* b[i] > > ### Hi, Safe from using numpy functions inside myfunc(), numpy has no way of optimizing your computation. vectorize() will help you to have a clean interface, but it will not enhance speed. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From nadavh at visionsense.com Mon Dec 1 05:37:25 2008 From: nadavh at visionsense.com (Nadav Horesh) Date: Mon, 1 Dec 2008 12:37:25 +0200 Subject: [Numpy-discussion] optimising single value functions for array calculations References: <12998.195.68.31.231.1228127326.squirrel@www.normalesup.org> Message-ID: <710F2847B0018641891D9A216027636029C359@ex3.envision.co.il> I does not solve the slowness problem. I think I read on the list about an experimental code for fast vectorization. Nadav. -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? Emmanuelle Gouillart ????: ? 01-?????-08 12:28 ??: Discussion of Numerical Python ????: Re: [Numpy-discussion] optimising single value functions for array calculations Hello Timmie, numpy.vectorize(myfunc) should do what you want. Cheers, Emmanuelle > Hello, > I am developing a module which bases its calculations > on another specialised module. > My module uses numpy arrays a lot. > The problem is that the other module I am building > upon, does not work with (whole) arrays but with > single values. > Therefore, I am currently forces to loop over the > array: > > ### > a = numpy.arange(100) > b = numpy.arange(100,200) > for i in range(0,a.size): > a[i] = myfunc(a[i])* b[i] > > ### > > The results come out well. But the problem is that this > way of calculation is very ineffiecent and takes time. > > May anyone give me a hint on how I can improve my > code without having to modify the package I am > building upon. I do not want to change it a lot because > I would always have to run behind the chnages in the > other package. > > To summarise: > How to I make a calculation function array-aware? > > Thanks in advance, > Timmie > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3856 bytes Desc: not available URL: From bakker at itc.nl Mon Dec 1 06:31:59 2008 From: bakker at itc.nl (Wim Bakker) Date: Mon, 1 Dec 2008 11:31:59 +0000 (UTC) Subject: [Numpy-discussion] memmap & dtype issue Message-ID: For a long time now, numpy's memmap has me puzzled by its behavior. When I use memmap straightforward on a file it seems to work fine, but whenever I try to do a memmap using a dtype it seems to gobble up the whole file into memory. This, of course, makes the use of memmap futile. I would expect that the result of such an operation would give me a true memmap and that the data would be converted to dtype on the fly. I've seen this behavior in version version 1.04, 1.1.1 and still in 1.2.1. I'm working on Windows haven't tried it on Linux. Am I doing something wrong? Are my expectations wrong? Or is this an issue somewhere deeper in numpy? I looked at the memmap.py and it seems to me that most of the work is delegated to numpy.ndarray.__new__. Something wrong there maybe? Can somebody help please? Thanks! Regards, Wim Bakker From stefan at sun.ac.za Mon Dec 1 07:14:47 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 1 Dec 2008 14:14:47 +0200 Subject: [Numpy-discussion] optimising single value functions for array calculations In-Reply-To: <710F2847B0018641891D9A216027636029C359@ex3.envision.co.il> References: <12998.195.68.31.231.1228127326.squirrel@www.normalesup.org> <710F2847B0018641891D9A216027636029C359@ex3.envision.co.il> Message-ID: <9457e7c80812010414u2fb5f93as3e5536eb1a53fa7d@mail.gmail.com> 2008/12/1 Nadav Horesh : > I does not solve the slowness problem. I think I read on the list about an > experimental code for fast vectorization. The choices are basically weave, fast_vectorize (http://projects.scipy.org/scipy/scipy/ticket/727), ctypes, cython or f2py. Any I left out? Ilan's fast_vectorize should have been included in SciPy a while ago already. Volunteers for patch review? Cheers St?fan From timmichelsen at gmx-topmail.de Mon Dec 1 08:38:23 2008 From: timmichelsen at gmx-topmail.de (Timmie) Date: Mon, 1 Dec 2008 13:38:23 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?optimising_single_value_functions_fo?= =?utf-8?q?r_array=09calculations?= References: <12998.195.68.31.231.1228127326.squirrel@www.normalesup.org> <710F2847B0018641891D9A216027636029C359@ex3.envision.co.il> Message-ID: Hi, > thanks for all your answers. I will certainly test it. > numpy.vectorize(myfunc) should do what you want. Just to add a better example based on a recent discussion here on this list [1]: myfunc(x): res = math.sin(x) return res a = numpy.arange(1,20) => myfunc(a) will not work. => myfunc need to have a possibility to pass single values to math.sin either through interation (see my inital email) or through other options. (I know that numpy has a array aware sinus but wanted to use it as an example here.) My orriginal problem evolves from here timeseries computing [2]. Well, I will test and report back further. Thanks again and until soon, Timmie [1]: http://thread.gmane.org/gmane.comp.python.numeric.general/26417/focus=26418 [2]: http://thread.gmane.org/gmane.comp.python.scientific.user/18253 From oliphant at enthought.com Mon Dec 1 09:30:02 2008 From: oliphant at enthought.com (Travis E. Oliphant) Date: Mon, 01 Dec 2008 08:30:02 -0600 Subject: [Numpy-discussion] memmap & dtype issue In-Reply-To: References: Message-ID: <4933F4EA.9040709@enthought.com> Wim Bakker wrote: > For a long time now, numpy's memmap has me puzzled by its behavior. When I use > memmap straightforward on a file it seems to work fine, but whenever I try to > do a memmap using a dtype it seems to gobble up the whole file into memory. > I don't understand your question. From my experience, the memmap is working fine. Please post and example that illustrates your point. > This, of course, makes the use of memmap futile. I would expect that the > result of such an operation would give me a true memmap and that the data > would be converted to dtype on the fly. > There is no conversion on the fly when you use memmap. You construct an array of the same data-type as is in the file and then manipulate portions of it as needed. > Am I doing something wrong? Are my expectations wrong? My guess is that your expectations are not accurate, but example code would help sort it out. Best regards, -Travis From dsdale24 at gmail.com Mon Dec 1 10:30:40 2008 From: dsdale24 at gmail.com (Darren Dale) Date: Mon, 1 Dec 2008 10:30:40 -0500 Subject: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing In-Reply-To: <20081201081220.GC18450@phare.normalesup.org> References: <492D8FD3.8050601@enthought.com> <492DC9B0.1030300@gmail.com> <5b8d13220811301944k7807d3a2w4fcc821255269053@mail.gmail.com> <20081201081220.GC18450@phare.normalesup.org> Message-ID: On Mon, Dec 1, 2008 at 3:12 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote: > > On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale wrote: > > > I tried installing 4.0.300x on a machine running 64-bit windows vista > home > > > edition and ran into problems with PyQt and some related packages. So I > > > uninstalled all the python-related software, EPD took over 30 minutes > to > > > uninstall, and tried to install EPD 4.1 beta. > > > My guess is that EPD is only 32 bits installer, so that you run it on > > WOW (Windows in Windows) on windows 64, which is kind of slow (but > > usable for most tasks). > > On top of that, Vista is not supported with EPD. I had a chat with the > EPD guys about that, and they say it does work with Vista... most of the > time. They don't really understand the failures, and haven't had time to > investigate much, because so far professionals and labs are simply > avoiding Vista. Hopefully someone from the EPD team will give a more > accurate answer > soon. Thanks Gael and David. I would avoid windows altogether if I could. When I bought a new laptop I had the option to pay extra to downgrade to XP pro, I should have done some more research before I settled for Vista. In the meantime I'll borrow an XP machine when I need to build python package installers for windows. Hopefully a solution can be found at some point for python and Vista. Losing compatibility on such a major platform will become increasingly problematic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Mon Dec 1 12:49:01 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 12:49:01 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... Message-ID: All, Please find attached to this message another implementation of np.loadtxt, which focuses on missing values. It's basically a combination of John Hunter's et al mlab.csv2rec, Ryan May's patches and pieces of code I'd been working on over the last few weeks. Besides some helper classes (StringConverter to convert a string into something else, NameValidator to check names..._), you'll find 3 functions: * `genloadtxt` is the base function that makes all the work. It outputs 2 arrays, one for the data (missing values being substituted by the appropriate default) and one for the mask. It would go in np.lib.io * `loadtxt` would replace the current np.loadtxt. It outputs a ndarray, where missing data being filled. It would also go in np.lib.io * `mloadtxt` would go into np.ma.io (to be created) and renamed `loadtxt`. Right now, I needed a different name to avoid conflicts. It combines the outputs of `genloadtxt` into a single masked array. You'll also several series of tests, that you can use as examples. Please give it a try and send me some feedback (bugs, wishes, suggestions). I'd like it to make the 1.3.0 release (I need some of the functionalities to improve the corresponding function in scikits.timeseries, currently fubar...) P. From pgmdevlist at gmail.com Mon Dec 1 13:21:32 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 13:21:32 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: Message-ID: <7267494B-A9AB-4649-B13D-BB00508954C9@gmail.com> And now for the tests: -------------- next part -------------- A non-text attachment was scrubbed... Name: genload_proposal_tests.py Type: text/x-python-script Size: 15192 bytes Desc: not available URL: From stefan at sun.ac.za Mon Dec 1 13:22:15 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 1 Dec 2008 20:22:15 +0200 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: Message-ID: <9457e7c80812011022t26bae211lfd317a1d314b7e3e@mail.gmail.com> 2008/12/1 Pierre GM : > Please find attached to this message another implementation of Struggling to comply! Cheers St?fan From pgmdevlist at gmail.com Mon Dec 1 13:21:08 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 13:21:08 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: Message-ID: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Well, looks like the attachment is too big, so here's the implementation. The tests will come in another message. -------------- next part -------------- A non-text attachment was scrubbed... Name: genload_proposal.py Type: text/x-python-script Size: 27313 bytes Desc: not available URL: -------------- next part -------------- From jdh2358 at gmail.com Mon Dec 1 13:54:27 2008 From: jdh2358 at gmail.com (John Hunter) Date: Mon, 1 Dec 2008 12:54:27 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Message-ID: <88e473830812011054o5b9c184aib1f41fec0faff6b7@mail.gmail.com> On Mon, Dec 1, 2008 at 12:21 PM, Pierre GM wrote: > Well, looks like the attachment is too big, so here's the implementation. > The tests will come in another message.\ It looks like I am doing something wrong -- trying to parse a CSV file with dates formatted like '2008-10-14', with:: import datetime, sys import dateutil.parser StringConverter.upgrade_mapper(dateutil.parser.parse, default=datetime.date(1900,1,1)) r = loadtxt(sys.argv[1], delimiter=',', names=True) print r.dtype I get the following:: Traceback (most recent call last): File "genload_proposal.py", line 734, in ? r = loadtxt(sys.argv[1], delimiter=',', names=True) File "genload_proposal.py", line 711, in loadtxt (output, _) = genloadtxt(fname, **kwargs) File "genload_proposal.py", line 646, in genloadtxt rows[i] = tuple([conv(val) for (conv, val) in zip(converters, vals)]) File "genload_proposal.py", line 385, in __call__ raise ValueError("Cannot convert string '%s'" % value) ValueError: Cannot convert string '2008-10-14' In debug mode, I see the following where the error occurs ipdb> vals ('2008-10-14', '116.26', '116.40', '103.14', '104.08', '70749800', '104.08') ipdb> converters [<__main__.StringConverter instance at 0xa35fa6c>, <__main__.StringConverter instance at 0xa35ff2c>, <__main__.StringConverter instance at 0xa35ff8c>, <__main__.StringConverter instance at 0xa35ffec>, <__main__.StringConverter instance at 0xa15406c>, <__main__.StringConverter instance at 0xa1540cc>, <__main__.StringConverter instance at 0xa15412c>] It looks like my registry of a custom converter isn't working. Here is what the _mapper looks like:: In [23]: StringConverter._mapper Out[23]: [(, , None), (, , -1), (, , -NaN), (, , (-NaN+0j)), (, , datetime.date(1900, 1, 1)), (, , '???')] From pgmdevlist at gmail.com Mon Dec 1 14:14:19 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 14:14:19 -0500 Subject: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation... References: <7B106C20-EB72-4126-956C-02866D204A3E@gmail.com> Message-ID: <8A781DA2-C341-41D1-9E0F-81B922EB5B87@gmail.com> (Sorry about that, I pressed "Reply" instead of "Reply all". Not my day for emails...) > On Dec 1, 2008, at 1:54 PM, John Hunter wrote: >> >> It looks like I am doing something wrong -- trying to parse a CSV >> file >> with dates formatted like '2008-10-14', with:: >> >> import datetime, sys >> import dateutil.parser >> StringConverter.upgrade_mapper(dateutil.parser.parse, >> default=datetime.date(1900,1,1)) >> r = loadtxt(sys.argv[1], delimiter=',', names=True) > > John, > The problem you have is that the default dtype is 'float' (for > backwards compatibility w/ the original np.loadtxt). What you want > is to automatically change the dtype according to the content of > your file: you should use dtype=None > > r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None) > > As you'll want a recarray, we could make a np.records.loadtxt > function where dtype=None would be the default... From jdh2358 at gmail.com Mon Dec 1 14:26:40 2008 From: jdh2358 at gmail.com (John Hunter) Date: Mon, 1 Dec 2008 13:26:40 -0600 Subject: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation... In-Reply-To: <8A781DA2-C341-41D1-9E0F-81B922EB5B87@gmail.com> References: <7B106C20-EB72-4126-956C-02866D204A3E@gmail.com> <8A781DA2-C341-41D1-9E0F-81B922EB5B87@gmail.com> Message-ID: <88e473830812011126i2ec9ad37t80b9c9712c49f19e@mail.gmail.com> On Mon, Dec 1, 2008 at 1:14 PM, Pierre GM wrote: >> The problem you have is that the default dtype is 'float' (for >> backwards compatibility w/ the original np.loadtxt). What you want >> is to automatically change the dtype according to the content of >> your file: you should use dtype=None >> >> r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None) >> >> As you'll want a recarray, we could make a np.records.loadtxt >> function where dtype=None would be the default... > As you'll want a recarray, we could make a np.records.loadtxt function where > dtype=None would be the default... OK, that worked great. I do think some a default impl in np.rec which returned a recarray would be nice. It might also be nice to have a method like np.rec.fromcsv which defaults to a delimiter=',', names=True and dtype=None. Since csv is one of the most common data interchange format in the world, it would be nice to have some obvious function that works with it with little or no customization required. Fernando and I have taught a scientific computing course on a number of occasions, and on the last round we taught to undergrads. Most of these students have little or no programming, for many the concept of an array is something they struggle with, dtypes are a difficult concept, but we found that they responded very well to our csv2rec example, because with no syntactic cruft they were able to load a file and do some stats on the columns, and I would like to see that ease of use preserved. JDH From pgmdevlist at gmail.com Mon Dec 1 14:42:27 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 14:42:27 -0500 Subject: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation... In-Reply-To: <88e473830812011126i2ec9ad37t80b9c9712c49f19e@mail.gmail.com> References: <7B106C20-EB72-4126-956C-02866D204A3E@gmail.com> <8A781DA2-C341-41D1-9E0F-81B922EB5B87@gmail.com> <88e473830812011126i2ec9ad37t80b9c9712c49f19e@mail.gmail.com> Message-ID: <70C80B0B-96F2-4667-BFF9-7D7FCB3958D6@gmail.com> On Dec 1, 2008, at 2:26 PM, John Hunter wrote > > OK, that worked great. I do think some a default impl in np.rec which > returned a recarray would be nice. It might also be nice to have a > method like np.rec.fromcsv which defaults to a delimiter=',', > names=True and dtype=None. Since csv is one of the most common data > interchange format in the world, it would be nice to have some > obvious function that works with it with little or no customization > required. Quite agreed. Personally, I'd ditch the default dtype=float in favor of dtype=None, but compatibility is an issue. However, if we all agree on genloadtxt, we can use tailored-made version in different modules, like you suggest. There's an extra issue for which we have an solution I'm not completely satisfied with: names=True. It might be simpler for basic user not to set names=True, and have the first header recognized as names or not if needed (by processing the first line after the others, and using it as header if it's found to be a list of names, or inserting it back at the beginning otherwise)... From ndbecker2 at gmail.com Mon Dec 1 14:43:11 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 01 Dec 2008 14:43:11 -0500 Subject: [Numpy-discussion] fromiter typo? Message-ID: Says it takes a default dtype arg, but doesn't act like it's an optional arg: fromiter (iterator or generator, dtype=None) Construct an array from an iterator or a generator. Only handles 1-dimensional cases. By default the data-type is determined from the objects returned from the iterator. ---> 20 z = fromiter (y) TypeError: function takes at least 2 arguments (1 given) From pav at iki.fi Mon Dec 1 14:56:51 2008 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 1 Dec 2008 19:56:51 +0000 (UTC) Subject: [Numpy-discussion] fromiter typo? References: Message-ID: Mon, 01 Dec 2008 14:43:11 -0500, Neal Becker wrote: > Says it takes a default dtype arg, but doesn't act like it's an optional > arg: > > fromiter (iterator or generator, dtype=None) Construct an array from an > iterator or a generator. Only handles 1-dimensional cases. By default > the data-type is determined from the objects returned from the iterator. > > ---> 20 z = fromiter (y) > > TypeError: function takes at least 2 arguments (1 given) The docstring is correct in 1.2.1 and in the documentation; I suppose you have an older version of Numpy. -- Pauli Virtanen From stefan at sun.ac.za Mon Dec 1 15:47:06 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 1 Dec 2008 22:47:06 +0200 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: Message-ID: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> Hi Pierre 2008/12/1 Pierre GM : > * `genloadtxt` is the base function that makes all the work. It > outputs 2 arrays, one for the data (missing values being substituted > by the appropriate default) and one for the mask. It would go in > np.lib.io I see the code length increased from 200 lines to 800. This made me wonder about the execution time: initial benchmarks suggest a 3x slow-down. Could this be a problem for loading large text files? If so, should we consider keeping both versions around, or by default bypassing all the extra hooks? Regards St?fan From rmay31 at gmail.com Mon Dec 1 16:23:18 2008 From: rmay31 at gmail.com (Ryan May) Date: Mon, 01 Dec 2008 15:23:18 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> Message-ID: <493455C6.7010701@gmail.com> St?fan van der Walt wrote: > Hi Pierre > > 2008/12/1 Pierre GM : >> * `genloadtxt` is the base function that makes all the work. It >> outputs 2 arrays, one for the data (missing values being substituted >> by the appropriate default) and one for the mask. It would go in >> np.lib.io > > I see the code length increased from 200 lines to 800. This made me > wonder about the execution time: initial benchmarks suggest a 3x > slow-down. Could this be a problem for loading large text files? If > so, should we consider keeping both versions around, or by default > bypassing all the extra hooks? I've wondered about this being an issue. On one hand, you hate to make existing code noticeably slower. On the other hand, if speed is important to you, why are you using ascii I/O? I personally am not entirely against having two versions of loadtxt-like functions. However, the idea seems a little odd, seeing as how loadtxt was already supposed to be the "swiss army knife" of text reading. I'm seeing a similar slowdown with Pierre's version of the code. The version of loadtxt that I cobbled together with the StringConverter class (and no missing value support) shows about a 50% slowdown, so clearly there's a performance penalty for trying to make a generic function that can be all things to all people. On the other hand, this approach reduces code duplication. I'm not really opinionated on what the right approach is here. My only opinion is that this functionality *really* needs to be in numpy in some fashion. For my own use case, with the old version, I could read a text file and by hand separate out columns and mask values. Now, I open a file and get a structured array with an automatically detected dtype (names and types!) plus masked values. My $0.02. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From stefan at sun.ac.za Mon Dec 1 16:47:00 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 1 Dec 2008 23:47:00 +0200 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <493455C6.7010701@gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> Message-ID: <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> 2008/12/1 Ryan May : > I've wondered about this being an issue. On one hand, you hate to make > existing code noticeably slower. On the other hand, if speed is > important to you, why are you using ascii I/O? More "I" than "O"! But I think numpy.fromfile, once fixed up, could fill this niche nicely. > I personally am not entirely against having two versions of loadtxt-like > functions. However, the idea seems a little odd, seeing as how loadtxt > was already supposed to be the "swiss army knife" of text reading. I haven't investigated the code in too much detail, but wouldn't it be possible to implement the current set of functionality in a base-class, which is then specialised to add the rest? That way, one could always instantiate TextReader yourself for some added speed. > I'm not really opinionated on what the right approach is here. My only > opinion is that this functionality *really* needs to be in numpy in some > fashion. For my own use case, with the old version, I could read a text > file and by hand separate out columns and mask values. Now, I open a > file and get a structured array with an automatically detected dtype > (names and types!) plus masked values. That's neat! Cheers St?fan From pgmdevlist at gmail.com Mon Dec 1 17:55:43 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 17:55:43 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> Message-ID: I agree, genloadtxt is a bit blotted, and it's not a surprise it's slower than the initial one. I think that in order to be fair, comparisons must be performed with matplotlib.mlab.csv2rec, that implements as well the autodetection of the dtype. I'm quite in favor of keeping a lite version around. On Dec 1, 2008, at 4:47 PM, St?fan van der Walt wrote: >> > I haven't investigated the code in too much detail, but wouldn't it be > possible to implement the current set of functionality in a > base-class, which is then specialised to add the rest? That way, one > could always instantiate TextReader yourself for some added speed. Well, one of the issues is that we need to keep the function compatible w/ urllib.urlretrieve (Ryan, am I right?), which means not being able to go back to the beginning of a file (no call to .seek). Another issue comes from the possibility to define the dtype automatically: you need to keep track of the converters, then have to do a second loop on the data. Those converters are likely the bottleneck, as you need to check whether each value can be interpreted as missing or not and respond appropriately. I thought about creating a base class, with a specific subclass taking care of the missing values. I found out it would have duplicated a lot of code In any case, I think that's secondary: we can always optimize pieces of the code afterwards. I'd like more feedback on corner cases and usage... From efiring at hawaii.edu Mon Dec 1 18:09:59 2008 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 01 Dec 2008 13:09:59 -1000 Subject: [Numpy-discussion] bug in ma.masked_all()? Message-ID: <49346EC7.6020109@hawaii.edu> Pierre, ma.masked_all does not seem to work with fancy dtypes and more then one dimension: In [1]:import numpy as np In [2]:dt = np.dtype({'names': ['a', 'b'], 'formats': ['f', 'f']}) In [3]:x = np.ma.masked_all((2,), dtype=dt) In [4]:x Out[4]: masked_array(data = [(--, --) (--, --)], mask = [(True, True) (True, True)], fill_value=(1.0000000200408773e+20, 1.0000000200408773e+20)) In [5]:x = np.ma.masked_all((2,2), dtype=dt) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/efiring/ in () /usr/local/lib/python2.5/site-packages/numpy/ma/extras.pyc in masked_all(shape, dtype) 78 """ 79 a = masked_array(np.empty(shape, dtype), ---> 80 mask=np.ones(shape, bool)) 81 return a 82 /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __new__(cls, data, mask, dtype, copy, subok, ndmin, fill_value, keep_mask, hard_mask, flag, shrink, **options) 1304 except TypeError: 1305 mask = np.array([tuple([m]*len(mdtype)) for m in mask], -> 1306 dtype=mdtype) 1307 # Make sure the mask and the data have the same shape 1308 if mask.shape != _data.shape: TypeError: expected a readable buffer object ----------------- Eric From Chris.Barker at noaa.gov Mon Dec 1 18:19:24 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 01 Dec 2008 15:19:24 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> Message-ID: <493470FC.6020004@noaa.gov> St?fan van der Walt wrote: >> important to you, why are you using ascii I/O? ascii I/O is slow, so that's a reason in itself to want it not to be slower! > More "I" than "O"! But I think numpy.fromfile, once fixed up, could > fill this niche nicely. I agree -- for the simple cases, fromfile() could work very well -- perhaps it could even be used to speed up some special cases of loadtxt. But is anyone working on fromfile()? By the way, I think overloading fromfile() for text files is a bit misleading for users -- I propose we have a fromtextfile() or something instead. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Mon Dec 1 18:21:07 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 01 Dec 2008 15:21:07 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> Message-ID: <49347163.8070207@noaa.gov> Pierre GM wrote: > Another issue comes from the possibility to define the dtype > automatically: Does all that get bypassed if the dtype(s) is specified? Is it still slow in that case? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Mon Dec 1 18:28:38 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 18:28:38 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <49347163.8070207@noaa.gov> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> Message-ID: <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> On Dec 1, 2008, at 6:21 PM, Christopher Barker wrote: > Pierre GM wrote: >> Another issue comes from the possibility to define the dtype >> automatically: > > Does all that get bypassed if the dtype(s) is specified? Is it still > slow in that case? Good question. Having a dtype != None does skip a secondary loop. Once again, I;m sure there's plenty of room for optimization (eg, different loops whether the dtype is defined or not, whether missing values have to be taken into account or not, etc...). I just want to make sure that we're not missing any functionality and/or corner cases and that the usage is intuitive enough before spending some time optimizing... From f.yw at hotmail.com Mon Dec 1 20:38:11 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 1 Dec 2008 18:38:11 -0700 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter In-Reply-To: <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> Message-ID: Hi, I need to convolve a 1d filter with 8 coefficients with a 2d array of the shape (6,7). I can use convolve to perform the operation for each row. This will involve a for loop with a counter 6. I wonder there is an fast way to do this in numpy without using for loop. Does anyone know how to do it? Thanks Frank _________________________________________________________________ Access your email online and on the go with Windows Live Hotmail. http://windowslive.com/Explore/Hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_access_112008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From h5py at alfven.org Mon Dec 1 20:53:46 2008 From: h5py at alfven.org (Andrew Collette) Date: Mon, 01 Dec 2008 17:53:46 -0800 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 Message-ID: <1228182826.24243.1.camel@tachyon-laptop> ===================================== Announcing HDF5 for Python (h5py) 1.0 ===================================== What is h5py? ------------- HDF5 for Python (h5py) is a general-purpose Python interface to the Hierarchical Data Format library, version 5. HDF5 is a versatile, mature scientific software library designed for the fast, flexible storage of enormous amounts of data. >From a Python programmer's perspective, HDF5 provides a robust way to store data, organized by name in a tree-like fashion. You can create datasets (arrays on disk) hundreds of gigabytes in size, and perform random-access I/O on desired sections. Datasets are organized in a filesystem-like hierarchy using containers called "groups", and accesed using the tradional POSIX /path/to/resource syntax. This is the fourth major release of h5py, and represents the end of the "unstable" (0.X.X) design phase. Why should I use it? -------------------- H5py provides a simple, robust read/write interface to HDF5 data from Python. Existing Python and NumPy concepts are used for the interface; for example, datasets on disk are represented by a proxy class that supports slicing, and has dtype and shape attributes. HDF5 groups are are presented using a dictionary metaphor, indexed by name. A major design goal of h5py is interoperability; you can read your existing data in HDF5 format, and create new files that any HDF5- aware program can understand. No Python-specific extensions are used; you're free to implement whatever file structure your application desires. Almost all HDF5 features are available from Python, including things like compound datatypes (as used with NumPy recarray types), HDF5 attributes, hyperslab and point-based I/O, and more recent features in HDF 1.8 like resizable datasets and recursive iteration over entire files. The foundation of h5py is a near-complete wrapping of the HDF5 C API. HDF5 identifiers are first-class objects which participate in Python reference counting, and expose the C API via methods. This low-level interface is also made available to Python programmers, and is exhaustively documented. See the Quick-Start Guide for a longer introduction with code examples: http://h5py.alfven.org/docs/guide/quick.html Where to get it --------------- * Main website, documentation: http://h5py.alfven.org * Downloads, bug tracker: http://h5py.googlecode.com * The HDF group website also contains a good introduction: http://www.hdfgroup.org/HDF5/doc/H5.intro.html Requires -------- * UNIX-like platform (Linux or Mac OS-X); Windows version is in progress. * Python 2.5 or 2.6 * NumPy 1.0.3 or later (1.1.0 or later recommended) * HDF5 1.6.5 or later, including 1.8. Some features only available when compiled against HDF5 1.8. * Optionally, Cython (see cython.org) if you want to use custom install options. You'll need version 0.9.8.1.1 or later. About this version ------------------ Version 1.0 follows version 0.3.1 as the latest public release. The major design phase (which began in May of 2008) is now over; the design of the high-level API will be supported as-is for the rest of the 1.X series, with minor enhancements. This is the first version to support Python 2.6, and the first to use Cython for the low-level interface. The license remains 3-clause BSD. ** This project is NOT affiliated with The HDF Group. ** Thanks ------ Thanks to D. Dale, E. Lawrence and other for their continued support and comments. Also thanks to the PyTables project, for inspiration and generously providing their code to the community, and to everyone at the HDF Group for creating such a useful piece of software. From pgmdevlist at gmail.com Mon Dec 1 21:42:53 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 1 Dec 2008 21:42:53 -0500 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <49346EC7.6020109@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> Message-ID: <2B254DF1-C3BA-4580-9D0C-9D65660D288E@gmail.com> On Dec 1, 2008, at 6:09 PM, Eric Firing wrote: > Pierre, > > ma.masked_all does not seem to work with fancy dtypes and more then > one dimension: Eric, Should be fixed in SVN (r6130). There were indeed problems with nested dtypes. Tricky beasts they are. Thanks for reporting! From josef.pktd at gmail.com Mon Dec 1 21:53:01 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 1 Dec 2008 21:53:01 -0500 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 In-Reply-To: <1228182826.24243.1.camel@tachyon-laptop> References: <1228182826.24243.1.camel@tachyon-laptop> Message-ID: <1cd32cbb0812011853r89e582dl2d65ac953c3983dc@mail.gmail.com> >Requires >-------- > >* UNIX-like platform (Linux or Mac OS-X); >Windows version is in progress I installed version 0.3.0 back in August on WindowsXP, and as far as I remember there were no problems at all with the install, and all tests pass. I thought the interface was really easy to use. But after trying it out I realized that my matlab is too old to understand the generated hdf5 files in an easy-to-use way, and I had to go back to csv-files. Josef From stefan at sun.ac.za Tue Dec 2 00:42:27 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 2 Dec 2008 07:42:27 +0200 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter In-Reply-To: References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> Message-ID: <9457e7c80812012142o39db6141h8b6a18d5b4de52af@mail.gmail.com> Hi Frank 2008/12/2 frank wang : > I need to convolve a 1d filter with 8 coefficients with a 2d array of the > shape (6,7). I can use convolve to perform the operation for each row. This > will involve a for loop with a counter 6. I wonder there is > an fast way to do this in numpy without using for loop. Does anyone know how > to do it? Since 6x7 is quite small, so you can afford this trick: a) Pad the 6,7 array to 6,14. b) Flatten the array c) Perform convolution d) Unflatten array e) Take out valid values Cheers St?fan From f.yw at hotmail.com Tue Dec 2 01:14:09 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 1 Dec 2008 23:14:09 -0700 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter In-Reply-To: <9457e7c80812012142o39db6141h8b6a18d5b4de52af@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> <9457e7c80812012142o39db6141h8b6a18d5b4de52af@mail.gmail.com> Message-ID: This is what I thought to do. However, I am not sure whether this is a fast way to do it and also I want to find a more generous way to do it. I thought there may be a more elegant way to do it. Thanks Frank> Date: Tue, 2 Dec 2008 07:42:27 +0200> From: stefan at sun.ac.za> To: numpy-discussion at scipy.org> Subject: Re: [Numpy-discussion] fast way to convolve a 2d array with 1d filter> > Hi Frank> > > 2008/12/2 frank wang :> > I need to convolve a 1d filter with 8 coefficients with a 2d array of the> > shape (6,7). I can use convolve to perform the operation for each row. This> > will involve a for loop with a counter 6. I wonder there is> > an fast way to do this in numpy without using for loop. Does anyone know how> > to do it?> > Since 6x7 is quite small, so you can afford this trick:> > a) Pad the 6,7 array to 6,14.> b) Flatten the array> c) Perform convolution> d) Unflatten array> e) Take out valid values> > Cheers> St?fan> _______________________________________________> Numpy-discussion mailing list> Numpy-discussion at scipy.org> http://projects.scipy.org/mailman/listinfo/numpy-discussion _________________________________________________________________ Get more done, have more fun, and stay more connected with Windows Mobile?. http://clk.atdmt.com/MRT/go/119642556/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Tue Dec 2 01:59:01 2008 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 01 Dec 2008 20:59:01 -1000 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <49346EC7.6020109@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> Message-ID: <4934DCB5.9000602@hawaii.edu> Pierre, Your change fixed masked_all for the example I gave, but I think it introduced a new failure in zeros: dt = np.dtype([((' Pressure, Digiquartz [db]', 'P'), ' in () /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __call__(self, a, *args, **params) 4533 # 4534 def __call__(self, a, *args, **params): -> 4535 return self._func.__call__(a, *args, **params).view(MaskedArray) 4536 4537 arange = _convert2ma('arange') /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __array_finalize__(self, obj) 1548 odtype = obj.dtype 1549 if odtype.names: -> 1550 _mask = getattr(obj, '_mask', make_mask_none(obj.shape, odtype)) 1551 else: 1552 _mask = getattr(obj, '_mask', nomask) /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in make_mask_none(newshape, dtype) 921 result = np.zeros(newshape, dtype=MaskType) 922 else: --> 923 result = np.zeros(newshape, dtype=make_mask_descr(dtype)) 924 return result 925 /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in make_mask_descr(ndtype) 819 if not isinstance(ndtype, np.dtype): 820 ndtype = np.dtype(ndtype) --> 821 return np.dtype(_make_descr(ndtype)) 822 823 def get_mask(a): /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in _make_descr(datatype) 806 descr = [] 807 for name in names: --> 808 (ndtype, _) = datatype.fields[name] 809 descr.append((name, _make_descr(ndtype))) 810 return descr ValueError: too many values to unpack From charlesr.harris at gmail.com Tue Dec 2 02:05:04 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 2 Dec 2008 00:05:04 -0700 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter In-Reply-To: References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> <49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> <9457e7c80812012142o39db6141h8b6a18d5b4de52af@mail.gmail.com> Message-ID: On Mon, Dec 1, 2008 at 11:14 PM, frank wang wrote: > This is what I thought to do. However, I am not sure whether this is a > fast way to do it and also I want to find a more generous way to do it. I > thought there may be a more elegant way to do it. > > Thanks > > Frank > Well, for just the one matrix not much will speed it up. If you have lots of matrices and the coefficients are fixed, then you can set up a "convolution" matrix whose columns are the coefficients shifted appropriately. Then just do a matrix multiply. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Tue Dec 2 03:16:02 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 03:16:02 -0500 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <4934DCB5.9000602@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> Message-ID: Eric, That's quite a handful you have with this dtype... So yes, the fix I gave works with nested dtypes and flexible dtypes with a simple name (string, not tuple). I'm a bit surprised with numpy, here. Consider: >>> dt.names ('P', 'D', 'T', 'w', 'S', 'sigtheta', 'theta') So we lose the tuple and get a single string instead, corresponding to the right-hand element of the name.. But this single string is one of the keys of dt.fields, whereas the tuple is not. Puzzling. I'm sure there must be some reference in the numpy book, but I can't look for it now. Anyway: Prior to version 6127, make_mask_descr was substituting the 2nd element of each tuple of a dtype.descr by a bool. Which failed for nested dtypes. Now, we check the field corresponding to a name, which fails in our particular case. I'll be working on it... On Dec 2, 2008, at 1:59 AM, Eric Firing wrote: > dt = np.dtype([((' Pressure, Digiquartz [db]', 'P'), ' Depth [salt water, m]', 'D'), ' C]', 'T'), ' Salinity [PSU]', 'S'), ' 'sigtheta'), ' 'theta'), ' > np.ma.zeros((2,2), dt) From efiring at hawaii.edu Tue Dec 2 04:26:35 2008 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 01 Dec 2008 23:26:35 -1000 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> Message-ID: <4934FF4B.2040801@hawaii.edu> Pierre GM wrote: > Eric, > That's quite a handful you have with this dtype... Here is a simplified example of how I made it: dt = np.dtype({'names': ['a','b'], 'formats': ['f', 'f'], 'titles': ['aaa', 'bbb']}) From page 132 in the numpy book: The fields dictionary is indexed by keys that are the names of the fields. Each entry in the dictionary is a tuple fully describing the field: (dtype, offset[,title]). If present, the optional title can actually be any object (if it is string or unicode then it will also be a key in the fields dictionary, otherwise it?s meta-data). -------- I put the titles in as a sort of additional documentation, and thinking that they might be useful for labeling plots; but it is rather hard to get the titles back out, since they are not directly accessible as an attribute, like names. Probably I should just omit them. Eric > So yes, the fix I gave works with nested dtypes and flexible dtypes > with a simple name (string, not tuple). I'm a bit surprised with > numpy, here. > Consider: > > >>> dt.names > ('P', 'D', 'T', 'w', 'S', 'sigtheta', 'theta') > > So we lose the tuple and get a single string instead, corresponding to > the right-hand element of the name.. > But this single string is one of the keys of dt.fields, whereas the > tuple is not. Puzzling. I'm sure there must be some reference in the > numpy book, but I can't look for it now. > > Anyway: > Prior to version 6127, make_mask_descr was substituting the 2nd > element of each tuple of a dtype.descr by a bool. Which failed for > nested dtypes. Now, we check the field corresponding to a name, which > fails in our particular case. > > > I'll be working on it... > > > > On Dec 2, 2008, at 1:59 AM, Eric Firing wrote: > >> dt = np.dtype([((' Pressure, Digiquartz [db]', 'P'), '> Depth [salt water, m]', 'D'), '> C]', 'T'), '> Salinity [PSU]', 'S'), '> 'sigtheta'), '> 'theta'), '> >> np.ma.zeros((2,2), dt) > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From pgmdevlist at gmail.com Tue Dec 2 04:42:21 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 04:42:21 -0500 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <4934FF4B.2040801@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> <4934FF4B.2040801@hawaii.edu> Message-ID: <517BA3D7-C785-4EEC-81BE-3C356B8C1145@gmail.com> On Dec 2, 2008, at 4:26 AM, Eric Firing wrote: > From page 132 in the numpy book: > > The fields dictionary is indexed by keys that are the names of the > fields. Each entry in the dictionary is a tuple fully describing the > field: (dtype, offset[,title]). If present, the optional title can > actually be any object (if it is string or unicode then it will also > be > a key in the fields dictionary, otherwise it?s meta-data). I should read it more often... > > I put the titles in as a sort of additional documentation, and > thinking > that they might be useful for labeling plots; That's actually quite a good idea... > but it is rather hard to > get the titles back out, since they are not directly accessible as an > attribute, like names. Probably I should just omit them. We could perhaps try a function: def gettitle(dtype, name): try: field = dtype.fields[name] except (TypeError, KeyError): return None else: if len(field) > 2: return field[-1] return None From Joris.DeRidder at ster.kuleuven.be Tue Dec 2 07:21:49 2008 From: Joris.DeRidder at ster.kuleuven.be (Joris De Ridder) Date: Tue, 2 Dec 2008 13:21:49 +0100 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> Message-ID: <81E7FF4F-5D57-456F-9328-C89B4B4F78EE@ster.kuleuven.be> On 1 Dec 2008, at 21:47 , St?fan van der Walt wrote: > Hi Pierre > > 2008/12/1 Pierre GM : >> * `genloadtxt` is the base function that makes all the work. It >> outputs 2 arrays, one for the data (missing values being substituted >> by the appropriate default) and one for the mask. It would go in >> np.lib.io > > I see the code length increased from 200 lines to 800. This made me > wonder about the execution time: initial benchmarks suggest a 3x > slow-down. Could this be a problem for loading large text files? If > so, should we consider keeping both versions around, or by default > bypassing all the extra hooks? > > Regards > St?fan As a historical note, we used to have scipy.io.read_array which at the time was considered by Travis too slow and too "grandiose" to be put in Numpy. As a consequence, numpy.loadtxt() was created which was simple and fast. Now it looks like we're going back to something grandiose. But perhaps it can be made grandiose *and* reasonably fast ;-). Cheers, Joris P.S. As a reference: http://article.gmane.org/gmane.comp.python.numeric.general/5556/ Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From nadavh at visionsense.com Tue Dec 2 07:36:55 2008 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 2 Dec 2008 14:36:55 +0200 Subject: [Numpy-discussion] fast way to convolve a 2d array with 1d filter References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com><493455C6.7010701@gmail.com><9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com><49347163.8070207@noaa.gov> <0706E291-BB4A-4AD7-9220-C6FD4F287483@gmail.com> Message-ID: <710F2847B0018641891D9A216027636029C35F@ex3.envision.co.il> You can use 2D convolution routines either in scipy.signal or numpy.numarray.nd_image Nadav -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? frank wang ????: ? 02-?????-08 03:38 ??: numpy-discussion at scipy.org ????: [Numpy-discussion] fast way to convolve a 2d array with 1d filter Hi, I need to convolve a 1d filter with 8 coefficients with a 2d array of the shape (6,7). I can use convolve to perform the operation for each row. This will involve a for loop with a counter 6. I wonder there is an fast way to do this in numpy without using for loop. Does anyone know how to do it? Thanks Frank _________________________________________________________________ Access your email online and on the go with Windows Live Hotmail. http://windowslive.com/Explore/Hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_access_112008 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3494 bytes Desc: not available URL: From aisaac at american.edu Tue Dec 2 08:12:25 2008 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 02 Dec 2008 08:12:25 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <81E7FF4F-5D57-456F-9328-C89B4B4F78EE@ster.kuleuven.be> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <81E7FF4F-5D57-456F-9328-C89B4B4F78EE@ster.kuleuven.be> Message-ID: <49353439.8030804@american.edu> On 12/2/2008 7:21 AM Joris De Ridder apparently wrote: > As a historical note, we used to have scipy.io.read_array which at the > time was considered by Travis too slow and too "grandiose" to be put > in Numpy. As a consequence, numpy.loadtxt() was created which was > simple and fast. Now it looks like we're going back to something > grandiose. But perhaps it can be made grandiose *and* reasonably > fast ;-). I hope this consideration remains prominent in this thread. Is the disappearance or read_array the reason for this change? What happened to it? Note that read_array_demo1.py is still in scipy.io despite the loss of read_array. Alan Isaac From aisaac at american.edu Tue Dec 2 08:46:29 2008 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 02 Dec 2008 08:46:29 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <49353439.8030804@american.edu> References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <81E7FF4F-5D57-456F-9328-C89B4B4F78EE@ster.kuleuven.be> <49353439.8030804@american.edu> Message-ID: <49353C35.8050909@american.edu> On 12/2/2008 8:12 AM Alan G Isaac apparently wrote: > I hope this consideration remains prominent > in this thread. Is the disappearance or > read_array the reason for this change? > What happened to it? Apologies; it is only deprecated, not gone. Alan Isaac From Christophe.Chappet at onera.fr Tue Dec 2 09:26:15 2008 From: Christophe.Chappet at onera.fr (Christophe Chappet) Date: Tue, 02 Dec 2008 15:26:15 +0100 Subject: [Numpy-discussion] [F2PY] Fortran call fails in IDLE / PyScripter Message-ID: <49354587.30906@onera.fr> Hi all, I compile the followinq code using "f2py -c --fcompiler=gnu95 --compiler=mingw32" -m hello subroutine AfficheMessage(szText) character szText*100 write (*,*) szText return end Using python console : >>>import hello >>>hello.affichemessage(" Hello") works fine ! I do the same in the program window of IDLE and : - no message is displayed. - the shell restart (or IDLE crah if launched with -n) Same problem with PyScripter IDE. (crash). Any suggestion ? Regards, Christophe -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Tue Dec 2 10:24:16 2008 From: rmay31 at gmail.com (Ryan May) Date: Tue, 2 Dec 2008 09:24:16 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: <9457e7c80812011247s2ea3b7a8x2807e29900946e9a@mail.gmail.com> <493455C6.7010701@gmail.com> <9457e7c80812011347s7a60c2e3x9d5e462056e2dd5a@mail.gmail.com> Message-ID: On Mon, Dec 1, 2008 at 4:55 PM, Pierre GM wrote: > On Dec 1, 2008, at 4:47 PM, St?fan van der Walt wrote: > >> > > I haven't investigated the code in too much detail, but wouldn't it be > > possible to implement the current set of functionality in a > > base-class, which is then specialised to add the rest? That way, one > > could always instantiate TextReader yourself for some added speed. > > Well, one of the issues is that we need to keep the function > compatible w/ urllib.urlretrieve (Ryan, am I right?), which means not > being able to go back to the beginning of a file (no call to .seek). > Well, the original version of loadtxt() checked for seek but didn't need it (fixed now), which kept me from using a urllib2.urlopen() object. If actually using seek() would speed up the new version of loadtxt(), feel free to use it. I'm more than capable of wrapping the urlopen() object within a StringIO. However, I am unconvinced that removing the 2nd loop and instead redoing the reading from the file will be much (if any) of a speed win. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Dec 2 10:58:46 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 2 Dec 2008 10:58:46 -0500 Subject: [Numpy-discussion] [F2PY] Fortran call fails in IDLE / PyScripter In-Reply-To: <49354587.30906@onera.fr> References: <49354587.30906@onera.fr> Message-ID: <1cd32cbb0812020758v5a56a7ebsf80640d07aaebb71@mail.gmail.com> On Tue, Dec 2, 2008 at 9:26 AM, Christophe Chappet wrote: > Hi all, > I compile the followinq code using "f2py -c --fcompiler=gnu95 > --compiler=mingw32" -m hello > subroutine AfficheMessage(szText) > character szText*100 > write (*,*) szText > return > end > > Using python console : >>>>import hello >>>>hello.affichemessage(" > Hello") > works fine ! > > I do the same in the program window of IDLE and : > - no message is displayed. > - the shell restart (or IDLE crah if launched with -n) > > Same problem with PyScripter IDE. (crash). > > Any suggestion ? > Regards, > Christophe > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > Is this a write to standard output "write (*,*) szText" ? Robert Kern mentioned several times that mingw is broken for writing to stdout but I only know about it for stdout in c. I always get a crash when a test compiles a write to stdout in c with mingw on my WindowsXP. But then my impression is that it shouldn't work on the command line either. Since I don't know much about f2py, I'm not sure whether fortran has the same problem as c with mingw. Josef From pgmdevlist at gmail.com Tue Dec 2 12:57:06 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 12:57:06 -0500 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <4934DCB5.9000602@hawaii.edu> References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> Message-ID: <6402833F-EFCA-41F1-9794-291ABB78F0AA@gmail.com> On Dec 2, 2008, at 1:59 AM, Eric Firing wrote: > Pierre, > > Your change fixed masked_all for the example I gave, but I think it > introduced a new failure in zeros: Eric, Would you mind giving r6131 a try ? It's rather ugly but looks like it works... From efiring at hawaii.edu Tue Dec 2 14:44:36 2008 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 02 Dec 2008 09:44:36 -1000 Subject: [Numpy-discussion] bug in ma.masked_all()? In-Reply-To: <6402833F-EFCA-41F1-9794-291ABB78F0AA@gmail.com> References: <49346EC7.6020109@hawaii.edu> <4934DCB5.9000602@hawaii.edu> <6402833F-EFCA-41F1-9794-291ABB78F0AA@gmail.com> Message-ID: <49359024.8060703@hawaii.edu> Pierre GM wrote: > > On Dec 2, 2008, at 1:59 AM, Eric Firing wrote: > >> Pierre, >> >> Your change fixed masked_all for the example I gave, but I think it >> introduced a new failure in zeros: > > Eric, > Would you mind giving r6131 a try ? It's rather ugly but looks like it > works... So far, so good. Thanks very much. Eric From zachary.pincus at yale.edu Tue Dec 2 14:47:38 2008 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 2 Dec 2008 14:47:38 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Message-ID: Hi Pierre, I've tested the new loadtxt briefly. Looks good, except that there's a minor bug when trying to use a specific white-space delimiter (e.g. \t) while still allowing other white-space to be allowed in fields (e.g. spaces). Specifically, on line 115 in LineSplitter, we have: self.delimiter = delimiter.strip() or None so if I pass in, say, '\t' as the delimiter, self.delimiter gets set to None, which then causes the default behavior of any-whitespace-is- delimiter to be used. This makes lines like "Gene Name\tPubMed ID \tStarting Position" get split wrong, even when I explicitly pass in '\t' as the delimiter! Similarly, I believe that some of the tests are formulated wrong: def test_nodelimiter(self): "Test LineSplitter w/o delimiter" strg = " 1 2 3 4 5 # test" test = LineSplitter(' ')(strg) assert_equal(test, ['1', '2', '3', '4', '5']) I think that treating an explicitly-passed-in ' ' delimiter as identical to 'no delimiter' is a bad idea. If I say that ' ' is the delimiter, or '\t' is the delimiter, this should be treated *just* like ',' being the delimiter, where the expected output is: ['1', '2', '3', '4', '', '5'] At least, that's what I would expect. Treating contiguous blocks of whitespace as single delimiters is perfectly reasonable when None is provided as the delimiter, but when an explicit delimiter has been provided, it strikes me that the code shouldn't try to further- interpret it... Does anyone else have any opinion here? Zach On Dec 1, 2008, at 1:21 PM, Pierre GM wrote: > Well, looks like the attachment is too big, so here's the > implementation. The tests will come in another message. > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From rmay31 at gmail.com Tue Dec 2 14:56:26 2008 From: rmay31 at gmail.com (Ryan May) Date: Tue, 02 Dec 2008 13:56:26 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Message-ID: <493592EA.8050205@gmail.com> Zachary Pincus wrote: > Specifically, on line 115 in LineSplitter, we have: > self.delimiter = delimiter.strip() or None > so if I pass in, say, '\t' as the delimiter, self.delimiter gets set > to None, which then causes the default behavior of any-whitespace-is- > delimiter to be used. This makes lines like "Gene Name\tPubMed ID > \tStarting Position" get split wrong, even when I explicitly pass in > '\t' as the delimiter! > > Similarly, I believe that some of the tests are formulated wrong: > def test_nodelimiter(self): > "Test LineSplitter w/o delimiter" > strg = " 1 2 3 4 5 # test" > test = LineSplitter(' ')(strg) > assert_equal(test, ['1', '2', '3', '4', '5']) > > I think that treating an explicitly-passed-in ' ' delimiter as > identical to 'no delimiter' is a bad idea. If I say that ' ' is the > delimiter, or '\t' is the delimiter, this should be treated *just* > like ',' being the delimiter, where the expected output is: > ['1', '2', '3', '4', '', '5'] > > At least, that's what I would expect. Treating contiguous blocks of > whitespace as single delimiters is perfectly reasonable when None is > provided as the delimiter, but when an explicit delimiter has been > provided, it strikes me that the code shouldn't try to further- > interpret it... > > Does anyone else have any opinion here? I agree. If the user explicity passes something as a delimiter, we should use it and not try to be too smart. +1 Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From robert.kern at gmail.com Tue Dec 2 15:01:01 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 2 Dec 2008 14:01:01 -0600 Subject: [Numpy-discussion] [F2PY] Fortran call fails in IDLE / PyScripter In-Reply-To: <49354587.30906@onera.fr> References: <49354587.30906@onera.fr> Message-ID: <3d375d730812021201s3a0115d3mbefbe89664c6ef8e@mail.gmail.com> On Tue, Dec 2, 2008 at 08:26, Christophe Chappet wrote: > Hi all, > I compile the followinq code using "f2py -c --fcompiler=gnu95 > --compiler=mingw32" -m hello > subroutine AfficheMessage(szText) > character szText*100 > write (*,*) szText > return > end > > Using python console : >>>>import hello >>>>hello.affichemessage(" > Hello") > works fine ! > > I do the same in the program window of IDLE and : > - no message is displayed. > - the shell restart (or IDLE crah if launched with -n) > > Same problem with PyScripter IDE. (crash). What version of gfortran are you using (i.e. exactly which binary did you download)? I'm not sure about the crash, but I can say that you will never get the output from a write statement inside the Fortran code to go to the IDLE prompt or PyScripter's window. They are not real terminals and do not capture text going to the process's real STDOUT file pointer. They simply change the sys.stdout object to capture text printed from Python. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From rmay31 at gmail.com Tue Dec 2 15:12:04 2008 From: rmay31 at gmail.com (Ryan May) Date: Tue, 02 Dec 2008 14:12:04 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> Message-ID: <49359694.8080605@gmail.com> Pierre GM wrote: > Well, looks like the attachment is too big, so here's the > implementation. The tests will come in another message. A couple of quick nitpicks: 1) On line 186 (in the NameValidator class), you use excludelist.append() to append a list to the end of a list. I think you meant to use excludelist.extend() 2) When validating a list of names, why do you insist on lower casing them? (I'm referring to the call to lower() on line 207). On one hand, this would seem nicer than all upper case, but on the other hand this can cause confusion for someone who sees certain casing of names in the file and expects that data to be laid out the same. Other than those, it's working fine for me here. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From h5py at alfven.org Tue Dec 2 16:30:47 2008 From: h5py at alfven.org (Andrew Collette) Date: Tue, 02 Dec 2008 13:30:47 -0800 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 In-Reply-To: <1cd32cbb0812011853r89e582dl2d65ac953c3983dc@mail.gmail.com> References: <1228182826.24243.1.camel@tachyon-laptop> <1cd32cbb0812011853r89e582dl2d65ac953c3983dc@mail.gmail.com> Message-ID: <1228253447.6348.12.camel@tachyon-laptop> Just FYI, the Windows installer for 1.0 is now posted at h5py.googlecode.com after undergoing some final testing. Thanks for trying 0.3.0... too bad about matlab. Andrew On Mon, 2008-12-01 at 21:53 -0500, josef.pktd at gmail.com wrote: > >Requires > >-------- > > > >* UNIX-like platform (Linux or Mac OS-X); > >Windows version is in progress > > > I installed version 0.3.0 back in August on WindowsXP, and as far as I > remember there were no problems at all with the install, and all tests > pass. > > I thought the interface was really easy to use. > But after trying it out I realized that my matlab is too old to > understand the generated hdf5 files in an easy-to-use way, and I had > to go back to csv-files. > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From pgmdevlist at gmail.com Tue Dec 2 16:48:26 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 16:48:26 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <49359694.8080605@gmail.com> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> Message-ID: On Dec 2, 2008, at 3:12 PM, Ryan May wrote: > Pierre GM wrote: >> Well, looks like the attachment is too big, so here's the >> implementation. The tests will come in another message. > > A couple of quick nitpicks: > > 1) On line 186 (in the NameValidator class), you use > excludelist.append() to append a list to the end of a list. I think > you > meant to use excludelist.extend() Good call. > 2) When validating a list of names, why do you insist on lower casing > them? (I'm referring to the call to lower() on line 207). On one > hand, > this would seem nicer than all upper case, but on the other hand this > can cause confusion for someone who sees certain casing of names in > the > file and expects that data to be laid out the same. I recall a life where names were case-insensitives, so 'dates' and 'Dates' and 'DATES' were the same field. It should be easy enough to get rid of that limitations, or add a parameter for case-sensitivity On Dec 2, 2008, at 2:47 PM, Zachary Pincus wrote: > Specifically, on line 115 in LineSplitter, we have: > self.delimiter = delimiter.strip() or None > so if I pass in, say, '\t' as the delimiter, self.delimiter gets set > to None, which then causes the default behavior of any-whitespace-is- > delimiter to be used. This makes lines like "Gene Name\tPubMed ID > \tStarting Position" get split wrong, even when I explicitly pass in > '\t' as the delimiter! OK, I'll check that. > > I think that treating an explicitly-passed-in ' ' delimiter as > identical to 'no delimiter' is a bad idea. If I say that ' ' is the > delimiter, or '\t' is the delimiter, this should be treated *just* > like ',' being the delimiter, where the expected output is: > ['1', '2', '3', '4', '', '5'] > Valid point. Well, all, stay tuned for yet another "yet another implementation..." > > Other than those, it's working fine for me here. > > Ryan From mail at stevesimmons.com Tue Dec 2 16:53:15 2008 From: mail at stevesimmons.com (Stephen Simmons) Date: Tue, 02 Dec 2008 22:53:15 +0100 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 In-Reply-To: <1228182826.24243.1.camel@tachyon-laptop> References: <1228182826.24243.1.camel@tachyon-laptop> Message-ID: <4935AE4B.6050606@stevesimmons.com> Do you have any plans to add lzo compression support, in addition to gzip? This is a feature I used a lot in PyTables. Andrew Collette wrote: > ===================================== > Announcing HDF5 for Python (h5py) 1.0 > ===================================== > > What is h5py? > ------------- > > HDF5 for Python (h5py) is a general-purpose Python interface to the > Hierarchical Data Format library, version 5. HDF5 is a versatile, > mature scientific software library designed for the fast, flexible > storage of enormous amounts of data. > > > From Chris.Barker at noaa.gov Tue Dec 2 17:36:10 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 02 Dec 2008 14:36:10 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> Message-ID: <4935B85A.6030701@noaa.gov> Pierre GM wrote: >> I think that treating an explicitly-passed-in ' ' delimiter as >> identical to 'no delimiter' is a bad idea. If I say that ' ' is the >> delimiter, or '\t' is the delimiter, this should be treated *just* >> like ',' being the delimiter, where the expected output is: >> ['1', '2', '3', '4', '', '5'] >> > > Valid point. > Well, all, stay tuned for yet another "yet another implementation..." While we're at it, it might be nice to be able to pass in more than one delimiter: ('\t',' '). though maybe that only combination that I'd really want would be something and '\n', which I think is being treated specially already. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Tue Dec 2 17:46:15 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Dec 2008 17:46:15 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4935B85A.6030701@noaa.gov> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> Message-ID: <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> Chris, I can try, but in that case, please write me a unittest, so that I have a clear and unambiguous idea of what you expect. ANFSCD, have you tried the missing_values option ? On Dec 2, 2008, at 5:36 PM, Christopher Barker wrote: > Pierre GM wrote: >>> I think that treating an explicitly-passed-in ' ' delimiter as >>> identical to 'no delimiter' is a bad idea. If I say that ' ' is the >>> delimiter, or '\t' is the delimiter, this should be treated *just* >>> like ',' being the delimiter, where the expected output is: >>> ['1', '2', '3', '4', '', '5'] >>> >> >> Valid point. >> Well, all, stay tuned for yet another "yet another implementation..." > > While we're at it, it might be nice to be able to pass in more than > one > delimiter: ('\t',' '). though maybe that only combination that I'd > really want would be something and '\n', which I think is being > treated > specially already. > > -Chris > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From h5py at alfven.org Tue Dec 2 18:12:20 2008 From: h5py at alfven.org (Andrew Collette) Date: Tue, 02 Dec 2008 15:12:20 -0800 Subject: [Numpy-discussion] ANN: HDF5 for Python 1.0 In-Reply-To: <4935AE4B.6050606@stevesimmons.com> References: <1228182826.24243.1.camel@tachyon-laptop> <4935AE4B.6050606@stevesimmons.com> Message-ID: <1228259541.14190.8.camel@tachyon-laptop> If it's a feature people want, I certainly wouldn't mind looking in to it. I believe PyTables supports bzip2 as well. Adding filters to HDF5 takes a bit of work but is well supported by the library. Andrew On Tue, 2008-12-02 at 22:53 +0100, Stephen Simmons wrote: > Do you have any plans to add lzo compression support, in addition to > gzip? This is a feature I used a lot in PyTables. > > Andrew Collette wrote: > > ===================================== > > Announcing HDF5 for Python (h5py) 1.0 > > ===================================== > > > > What is h5py? > > ------------- > > > > HDF5 for Python (h5py) is a general-purpose Python interface to the > > Hierarchical Data Format library, version 5. HDF5 is a versatile, > > mature scientific software library designed for the fast, flexible > > storage of enormous amounts of data. > > From ggellner at uoguelph.ca Tue Dec 2 22:57:07 2008 From: ggellner at uoguelph.ca (Gabriel Gellner) Date: Tue, 2 Dec 2008 22:57:07 -0500 Subject: [Numpy-discussion] PyArray_EMPTY and Cython Message-ID: <20081203035707.GA28913@encolpuis> After some discussion on the Cython lists I thought I would try my hand at writing some Cython accelerators for empty and zeros. This will involve using PyArray_EMPTY, I have a simple prototype I would like to get working, but currently it segfaults. Any tips on what I might be missing? import numpy as np cimport numpy as np cdef extern from "numpy/arrayobject.h": PyArray_EMPTY(int ndims, np.npy_intp* dims, int type, bint fortran) cdef np.ndarray empty(np.npy_intp length): cdef np.ndarray[np.double_t, ndim=1] ret cdef int type = np.NPY_DOUBLE cdef int ndims = 1 cdef np.npy_intp* dims dims = &length print dims[0] print type ret = PyArray_EMPTY(ndims, dims, type, False) return ret def test(): cdef np.ndarray[np.double_t, ndim=1] y = empty(10) return y The code seems to print out the correct dims and type info but segfaults when the PyArray_EMPTY call is made. Thanks, Gabriel From Christophe.Chappet at onera.fr Wed Dec 3 04:10:52 2008 From: Christophe.Chappet at onera.fr (Christophe Chappet) Date: Wed, 03 Dec 2008 10:10:52 +0100 Subject: [Numpy-discussion] [F2PY] Fortran call fails in IDLE / PyScripter Message-ID: <49364D1C.8060609@onera.fr> >What version of gfortran are you using (i.e. exactly which binary did >you download)? GNU Fortran (GCC) 4.4.0 20080603 (experimental) [trunk revision 136333] >Is this a write to standard output "write (*,*) szText" ? yes, it is. I forgot to say that it also works with pydev in Eclipse but I'm looking for a simple interactive python shell that can execute some programs. IPython does the job but is less friendly than IDLE in term of program editing. Anyway, I think I will use it for now. Thanks for your reply. Regards, Christophe On Tue, Dec 2, 2008 at 08:26, Christophe Chappet > wrote: >/ Hi all, />/ I compile the followinq code using "f2py -c --fcompiler=gnu95 />/ --compiler=mingw32" -m hello />/ subroutine AfficheMessage(szText) />/ character szText*100 />/ write (*,*) szText />/ return />/ end />/ />/ Using python console : />>>>/import hello />>>>/hello.affichemessage(" />/ Hello") />/ works fine ! />/ />/ I do the same in the program window of IDLE and : />/ - no message is displayed. />/ - the shell restart (or IDLE crah if launched with -n) />/ />/ Same problem with PyScripter IDE. (crash)./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barthelemy at crans.org Wed Dec 3 09:19:29 2008 From: barthelemy at crans.org (=?ISO-8859-1?Q?S=E9bastien_Barth=E9lemy?=) Date: Wed, 3 Dec 2008 15:19:29 +0100 Subject: [Numpy-discussion] trouble subclassing ndarray Message-ID: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> Hello, I'm trying to write a small library of differential geometry, and I have some trouble subclassing ndarray. I'd like an HomogeneousMatrix class that subclasse ndarray and overloads some methods, such as inv(). Here is my first try, the inv() function and the inv_v1() method work as expected, but the inv_v2() and inv_v3() methods do not change the object at all. Can somebody explain me what is happening here ? import numpy as np def inv(H): """ inverse of an homogeneous matrix """ R = H[0:3,0:3] p = H[0:3,3:4] return np.vstack( (np.hstack((R.T,-np.dot(R.T,p))), [0,0,0,1])) class HomogeneousMatrix(np.ndarray): def __new__(subtype, data=np.eye(4)): subarr = np.array(data) if htr.ishomogeneousmatrix(subarr): return subarr.view(subtype) else: raise ValueError def inv_v1(self): self[0:4,0:4] = htr.inv(self) def inv_v2(self): data = htr.inv(self) self = HomogeneousMatrix(data) def inv_v3(self): self = htr.inv(self) Thank you ! -- S?bastien From silva at lma.cnrs-mrs.fr Wed Dec 3 10:24:43 2008 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Wed, 03 Dec 2008 16:24:43 +0100 Subject: [Numpy-discussion] trouble subclassing ndarray In-Reply-To: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> References: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> Message-ID: <1228317884.2947.7.camel@Portable-s2m.cnrs-mrs.fr> Le mercredi 03 d?cembre 2008, S?bastien Barth?lemy a ?crit : > Hello, Hi Sebastien! > I'm trying to write a small library of differential geometry, and I > have some trouble subclassing ndarray. > I'd like an HomogeneousMatrix class that subclasse ndarray and > overloads some methods, such as inv(). > Here is my first try, the inv() function and the inv_v1() method work > as expected, but the inv_v2() and inv_v3() methods do not change the > object at all. Can somebody explain me what is happening here ? > > import numpy as np > def inv(H): > """ > inverse of an homogeneous matrix > """ > R = H[0:3,0:3] > p = H[0:3,3:4] > return np.vstack( (np.hstack((R.T,-np.dot(R.T,p))), [0,0,0,1])) > > class HomogeneousMatrix(np.ndarray): > def __new__(subtype, data=np.eye(4)): > subarr = np.array(data) > if htr.ishomogeneousmatrix(subarr): > return subarr.view(subtype) > else: > raise ValueError > def inv_v1(self): > self[0:4,0:4] = htr.inv(self) > def inv_v2(self): > data = htr.inv(self) > self = HomogeneousMatrix(data) > def inv_v3(self): > self = htr.inv(self) There is something I missed: what is htr? I guess htr.inv is the inv function defined before the class. Another point: it seems weird to me that, in the class' methods inv_v2 and inv_v3, you 'unref' the previous instance of HomogeneousMatrix and link the 'self' label to a new instance... In inv_v1, you just modify the coefficient of the Homogeneous Matrix with the coefficient of htr.inv(self) -- Fabrice Silva LMA UPR CNRS 7051 - ?quipe S2M From bioinformed at gmail.com Wed Dec 3 10:32:19 2008 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 3 Dec 2008 10:32:19 -0500 Subject: [Numpy-discussion] trouble subclassing ndarray In-Reply-To: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> References: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> Message-ID: <2e1434c10812030732mac096d2x331ec2e989c42d@mail.gmail.com> On Wed, Dec 3, 2008 at 9:19 AM, S?bastien Barth?lemy wrote: > def inv_v1(self): > self[0:4,0:4] = htr.inv(self) > def inv_v2(self): > data = htr.inv(self) > self = HomogeneousMatrix(data) > def inv_v3(self): > self = htr.inv(self) > self is a reference, so you're just overwriting it with references to new values in v2 and v3. The original object is unchanged. Only v1 changes self. You may want to use "self[:] = ....". -Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From barthelemy at crans.org Wed Dec 3 10:43:59 2008 From: barthelemy at crans.org (=?ISO-8859-1?Q?S=E9bastien_Barth=E9lemy?=) Date: Wed, 3 Dec 2008 16:43:59 +0100 Subject: [Numpy-discussion] trouble subclassing ndarray In-Reply-To: <2e1434c10812030732mac096d2x331ec2e989c42d@mail.gmail.com> References: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> <2e1434c10812030732mac096d2x331ec2e989c42d@mail.gmail.com> Message-ID: <78f7ab620812030743o707d822p679242ef25828046@mail.gmail.com> 2008/12/3 Kevin Jacobs : > On Wed, Dec 3, 2008 at 9:19 AM, S?bastien Barth?lemy > wrote: >> >> def inv_v1(self): >> self[0:4,0:4] = htr.inv(self) >> def inv_v2(self): >> data = htr.inv(self) >> self = HomogeneousMatrix(data) >> def inv_v3(self): >> self = htr.inv(self) > > self is a reference, so you're just overwriting it with references to new > values in v2 and v3. The original object is unchanged. Only v1 changes > self. You may want to use "self[:] = ....". okay, it seems obvious now. I definitely spent to much time with matlab. Thanks From barthelemy at crans.org Wed Dec 3 10:56:42 2008 From: barthelemy at crans.org (=?ISO-8859-1?Q?S=E9bastien_Barth=E9lemy?=) Date: Wed, 3 Dec 2008 16:56:42 +0100 Subject: [Numpy-discussion] trouble subclassing ndarray In-Reply-To: <1228317884.2947.7.camel@Portable-s2m.cnrs-mrs.fr> References: <78f7ab620812030619r3050eb7bue54f2a0e91a8ce3e@mail.gmail.com> <1228317884.2947.7.camel@Portable-s2m.cnrs-mrs.fr> Message-ID: <78f7ab620812030756p7a5df60bj22d8ef72a519a6c@mail.gmail.com> 2008/12/3 Fabrice Silva : > Le mercredi 03 d?cembre 2008, S?bastien Barth?lemy a ?crit : >> Hello, > Hi Sebastien! Hello Fabrice > There is something I missed: what is htr? I guess htr.inv is the inv > function defined before the class. yes, I cut-n-pasted the function definition from the htr module and forgot to tell it, sorry Thank you From rmay31 at gmail.com Wed Dec 3 11:41:56 2008 From: rmay31 at gmail.com (Ryan May) Date: Wed, 03 Dec 2008 10:41:56 -0600 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> Message-ID: <4936B6D4.2060405@gmail.com> Pierre GM wrote: >> I think that treating an explicitly-passed-in ' ' delimiter as >> identical to 'no delimiter' is a bad idea. If I say that ' ' is the >> delimiter, or '\t' is the delimiter, this should be treated *just* >> like ',' being the delimiter, where the expected output is: >> ['1', '2', '3', '4', '', '5'] >> > > Valid point. > Well, all, stay tuned for yet another "yet another implementation..." > Found a problem. If you read the names from the file and specify usecols, you end up with the first N names read from the file as the fields in your output (where N is the number of entries in usecols), instead of having the names of the columns you asked for. For instance: >>>from StringIO import StringIO >>>from genload_proposal import loadtxt >>>f = StringIO('stid stnm relh tair\nnrmn 121 45 9.1') >>>loadtxt(f, usecols=('stid', 'relh', 'tair'), names=True, dtype=None) array(('nrmn', 45, 9.0999999999999996), dtype=[('stid', '|S4'), ('stnm', ' From pgmdevlist at gmail.com Wed Dec 3 12:08:15 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 12:08:15 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936B6D4.2060405@gmail.com> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> Message-ID: On Dec 3, 2008, at 11:41 AM, Ryan May wrote: > > Found a problem. If you read the names from the file and specify > usecols, you end up with the first N names read from the file as the > fields in your output (where N is the number of entries in usecols), > instead of having the names of the columns you asked for. > > <..> > > I've attached a version that fixes this by setting a flag internally > if the names are read from the file. If this flag is true, at the > end the names are filtered down to only the ones that are given in > usecols. OK, thx. I'll take that into account and post a new version by the end of the day. > > I also have one other thought. Is there any way we can make this > handle object arrays, or rather, a field containing objects, > specifically datetime objects? Right now, this does not work > because calling view does not work for object arrays. I'm just > looking for a simple way to store date/time in my record array > (currently a string field). It does already: you can upgrade the mapper of StringConverter to support datetime object. Check an earlier post by JDH and my answer. I'll add an example in the test suite. From aisaac at american.edu Wed Dec 3 12:32:01 2008 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 03 Dec 2008 12:32:01 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> Message-ID: <4936C291.5090306@american.edu> If I know my data is already clean and is handled nicely by the old loadtxt, will I be able to turn off and the special handling in order to retain the old load speed? Alan Isaac From Chris.Barker at noaa.gov Wed Dec 3 12:48:16 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 09:48:16 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> Message-ID: <4936C660.5040906@noaa.gov> Pierre GM wrote: > I can try, but in that case, please write me a unittest, so that I > have a clear and unambiguous idea of what you expect. fair enough, though I'm not sure when I'll have time to do it. I do wonder if anyone else thinks it would be useful to have multiple delimiters as an option. I got the idea because with fromfile(), if you specify, say ',' as the delimiter, it won't use '\n', only a comma, so there is no way to quickly read a whole bunch of comma delimited data like: 1,2,3,4 5,6,7,8 .... so I'd like to be able to say to use either ',' or '\n' as the delimiter. However, if I understand loadtxt() correctly, it's handling the new lines separately anyway (to get a 2-d array), so this use case isn't an issue. So how likely is it that someone would have: 1 2 3, 4, 5 6 7 8, 8, 9 and want to read that into a single 2-d array? I'm not sure I've seen it. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Wed Dec 3 12:58:30 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 12:58:30 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C660.5040906@noaa.gov> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> Message-ID: <6DBB8759-CEFD-41B7-8D9E-1EF7ACB5C859@gmail.com> On Dec 3, 2008, at 12:48 PM, Christopher Barker wrote: > Pierre GM wrote: >> I can try, but in that case, please write me a unittest, so that I >> have a clear and unambiguous idea of what you expect. > > fair enough, though I'm not sure when I'll have time to do it. Oh, don;t worry, nothing too fancy: give me a couple lines of input data and a line with what you expect. Using Ryan's recent example: >>>f = StringIO('stid stnm relh tair\nnrmn 121 45 9.1') >>> test = loadtxt(f, usecols=('stid', 'relh', 'tair'), names=True, dtype=None) >>> control=array(('nrmn', 45, 9.0999999999999996), dtype=[('stid', '|S4'), ('relh', ' I do wonder if anyone else thinks it would be useful to have multiple > delimiters as an option. I got the idea because with fromfile(), if > you > specify, say ',' as the delimiter, it won't use '\n', only a comma, > so > there is no way to quickly read a whole bunch of comma delimited > data like: > > 1,2,3,4 > 5,6,7,8 > .... > > so I'd like to be able to say to use either ',' or '\n' as the > delimiter. I'm not quite sure I follow you. Do you want to delimiters, one for the field of a record (','), one for the records ("\n") ? > > However, if I understand loadtxt() correctly, it's handling the new > lines separately anyway (to get a 2-d array), so this use case isn't > an > issue. So how likely is it that someone would have: > > 1 2 3, 4, 5 > 6 7 8, 8, 9 > > and want to read that into a single 2-d array? With the current behaviour, you gonna have [("1 2 3", 4, 5), ("6 7 8", 8, 9)] if you use "," as a delimiter, [(1,2,"3,","4,",5),(6,7,"8,","8,",9)] if you use " " as a delimiter. Mixing delimiter is doable, but I don't think it's that a good idea. I'm in favor of sticking to one and only field delimiter, and the default line spearator for record delimiter. In other terms, not changing anythng. From pgmdevlist at gmail.com Wed Dec 3 12:59:38 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 12:59:38 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C291.5090306@american.edu> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> Message-ID: <59E58280-9EFC-4AC9-A2F4-DC9B7B50FF17@gmail.com> On Dec 3, 2008, at 12:32 PM, Alan G Isaac wrote: > If I know my data is already clean > and is handled nicely by the > old loadtxt, will I be able to turn > off and the special handling in > order to retain the old load speed? Hopefully. I'm looking for the best way to do it. Do you have an example you could send me off-list so that I can play with timers ? Thx in advance. P. From Chris.Barker at noaa.gov Wed Dec 3 13:00:58 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 10:00:58 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C660.5040906@noaa.gov> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> Message-ID: <4936C95A.3070706@noaa.gov> by the way, should this work: io.loadtxt('junk.dat', delimiter=' ') for more than one space between numbers, like: 1 2 3 4 5 6 7 8 9 10 I get: io.loadtxt('junk.dat', delimiter=' ') Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/io.py", line 403, in loadtxt X.append(tuple([conv(val) for (conv, val) in zip(converters, vals)])) ValueError: empty string for float() with the current version. >>> io.loadtxt('junk.dat', delimiter=None) array([[ 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10.]]) does work. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Dec 3 13:14:02 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 10:14:02 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <6DBB8759-CEFD-41B7-8D9E-1EF7ACB5C859@gmail.com> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> <6DBB8759-CEFD-41B7-8D9E-1EF7ACB5C859@gmail.com> Message-ID: <4936CC6A.4010409@noaa.gov> Pierre GM wrote: > Oh, don;t worry, nothing too fancy: give me a couple lines of input > data and a line with what you expect. I just went and looked at the existing tests, and you're right, it's very easy -- my first foray into the new nose tests -- very nice! >> specify, say ',' as the delimiter, it won't use '\n', only a comma, >> so >> there is no way to quickly read a whole bunch of comma delimited >> data like: >> >> 1,2,3,4 >> 5,6,7,8 >> .... >> >> so I'd like to be able to say to use either ',' or '\n' as the >> delimiter. > > I'm not quite sure I follow you. > Do you want to delimiters, one for the field of a record (','), one > for the records ("\n") ? well, in the case of fromfile(), it doesn't "do" records -- it will only give you a 1-d array, so I want it all as a flat array, and you can re-size it yourself later. Clearly this is more work (and requires more knowledge of your data) than using loadtxt, but sometimes I really want FAST data reading of simple formats. However, this isn't fromfile() we are talking about now, it's loadtxt()... >> So how likely is it that someone would have: >> >> 1 2 3, 4, 5 >> 6 7 8, 8, 9 >> >> and want to read that into a single 2-d array? > > With the current behaviour, you gonna have > [("1 2 3", 4, 5), ("6 7 8", 8, 9)] if you use "," as a delimiter, > [(1,2,"3,","4,",5),(6,7,"8,","8,",9)] if you use " " as a delimiter. right. > Mixing delimiter is doable, but I don't think it's that a good idea. I can't come up with a use case at this point, so.. > I'm in favor of sticking to one and only field delimiter, and the > default line spearator for record delimiter. In other terms, not > changing anything. I agree -- sorry for the noise! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Dec 3 13:19:58 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 10:19:58 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C291.5090306@american.edu> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> Message-ID: <4936CDCE.6020305@noaa.gov> Alan G Isaac wrote: > If I know my data is already clean > and is handled nicely by the > old loadtxt, will I be able to turn > off and the special handling in > order to retain the old load speed? what I'd like to see is a version of loadtxt built on a slightly enhanced fromfile() -- that would be blazingly fast for the easy cases (simple tabular data of one dtype). I don't know if the special-casing should be automatic, or just have it be a separate function. Also, fromfile() needs some work, and it needs to be done in C, which is less fun, so who knows when it will get done. As I think about it, maybe what I really want is a simple version of loadtxt written in C: It would only handle one data type at a time. It would support simple comment lines. It would only support one delimiter (plus newline). It would create a 2-d array from normal, tabular data. You could specify: how many numbers you wanted, or how many rows, or read 'till EOF Actually, this is a lot like matlab's fscanf() someday.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Wed Dec 3 13:52:30 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 13:52:30 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C95A.3070706@noaa.gov> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> <4936C95A.3070706@noaa.gov> Message-ID: On Dec 3, 2008, at 1:00 PM, Christopher Barker wrote: > by the way, should this work: > > io.loadtxt('junk.dat', delimiter=' ') > > for more than one space between numbers, like: > > 1 2 3 4 5 > 6 7 8 9 10 On the version I'm working on, both delimiter='' and delimiter=None (default) would give you the expected output. delimiter=' ' would fail, delimiter=' ' would work. From mmetz at astro.uni-bonn.de Wed Dec 3 14:08:04 2008 From: mmetz at astro.uni-bonn.de (Manuel Metz) Date: Wed, 03 Dec 2008 20:08:04 +0100 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936C291.5090306@american.edu> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> Message-ID: <4936D914.4020100@astro.uni-bonn.de> Alan G Isaac wrote: > If I know my data is already clean > and is handled nicely by the > old loadtxt, will I be able to turn > off and the special handling in > order to retain the old load speed? > > Alan Isaac > Hi all, that's going in the same direction I was thinking about. When I thought about an improved version of loadtxt, I wished it was fault tolerant without loosing too much performance. So my solution was much simpler than the very nice genloadtxt function -- and it works for me. My ansatz is to leave the existing loadtxt function unchanged. I only replaced the default converter calls by a fault tolerant converter class. I attached a patch against io.py in numpy 1.2.1 The nice thing is that it not only handles missing values, but for example also columns/fields with non-number characters. It just returns nan in these cases. This is of practical importance for many datafiles of astronomical catalogues, for example the Hipparcos catalogue data. Regarding the performance, it is a little bit slower than the original loadtxt, but not much: on my machine, 10x reading in a clean testfile with 3 columns and 20000 rows I get the following results: original loadtxt: ~1.3s modified loadtxt: ~1.7s new genloadtxt : ~2.7s So you see, there is some loss of performance, but not as much as with the new converter class. I hope this solution is of interest ... Manuel -------------- next part -------------- A non-text attachment was scrubbed... Name: io.diff Type: text/x-patch Size: 678 bytes Desc: not available URL: From Chris.Barker at noaa.gov Wed Dec 3 14:12:49 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Dec 2008 11:12:49 -0800 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4935B85A.6030701@noaa.gov> <6EF7939F-36BD-4064-9C4B-FC6553E79DE4@gmail.com> <4936C660.5040906@noaa.gov> <4936C95A.3070706@noaa.gov> Message-ID: <4936DA31.4070608@noaa.gov> Pierre GM wrote: > On Dec 3, 2008, at 1:00 PM, Christopher Barker wrote: >> for more than one space between numbers, like: >> >> 1 2 3 4 5 >> 6 7 8 9 10 > > > On the version I'm working on, both delimiter='' and delimiter=None > (default) would give you the expected output. so empty string and None both mean "any white space"? also tabs, etc? > delimiter=' ' would fail, s only exactly that delimiter. Is that so things like '\t' will work right? but what about: 4, 5, 34,123, .... In that case, ',' is the delimiter, but whitespace is ignored. or 4\t 5\t 34\t 123. we're ignoring extra whitespace there, too, so I'm not sure why we shouldn't ignore it in the ' ' case also. delimiter=' ' would work. but in my example, there were sometimes two spaces, sometimes three -- so I think it would fail, no? >>> "1 2 3 4 5".split(' ') ['1', '2', '3', '4', ' 5'] actually, that would work, but four spaces wouldn't. >>> "1 2 3 4 5".split(' ') ['1', '2', '3', '4', '', '5'] I guess the solution is to use delimiter=None in that case, and is does make sense that you can't have ' ' mean "one or more spaces", but "\t" mean "only one tab". -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From mmetz at astro.uni-bonn.de Wed Dec 3 14:12:16 2008 From: mmetz at astro.uni-bonn.de (Manuel Metz) Date: Wed, 03 Dec 2008 20:12:16 +0100 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936D914.4020100@astro.uni-bonn.de> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> <4936D914.4020100@astro.uni-bonn.de> Message-ID: <4936DA10.6020202@astro.uni-bonn.de> Manuel Metz wrote: > Alan G Isaac wrote: >> If I know my data is already clean >> and is handled nicely by the >> old loadtxt, will I be able to turn >> off and the special handling in >> order to retain the old load speed? >> >> Alan Isaac >> > > Hi all, > that's going in the same direction I was thinking about. > When I thought about an improved version of loadtxt, I wished it was > fault tolerant without loosing too much performance. > So my solution was much simpler than the very nice genloadtxt function > -- and it works for me. > > My ansatz is to leave the existing loadtxt function unchanged. I only > replaced the default converter calls by a fault tolerant converter > class. I attached a patch against io.py in numpy 1.2.1 > > The nice thing is that it not only handles missing values, but for > example also columns/fields with non-number characters. It just returns > nan in these cases. This is of practical importance for many datafiles > of astronomical catalogues, for example the Hipparcos catalogue data. > > Regarding the performance, it is a little bit slower than the original > loadtxt, but not much: on my machine, 10x reading in a clean testfile > with 3 columns and 20000 rows I get the following results: > > original loadtxt: ~1.3s > modified loadtxt: ~1.7s > new genloadtxt : ~2.7s > > So you see, there is some loss of performance, but not as much as with > the new converter class. > > I hope this solution is of interest ... > > Manuel > Oops, wrong version of the diff file. Wanted to name the class "_faulttolerantconv" ... -------------- next part -------------- A non-text attachment was scrubbed... Name: io.diff Type: text/x-patch Size: 628 bytes Desc: not available URL: From pgmdevlist at gmail.com Wed Dec 3 14:21:05 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 3 Dec 2008 14:21:05 -0500 Subject: [Numpy-discussion] np.loadtxt : yet a new implementation... In-Reply-To: <4936DA10.6020202@astro.uni-bonn.de> References: <36CE5691-9274-4360-AF86-DB74600D5166@gmail.com> <49359694.8080605@gmail.com> <4936B6D4.2060405@gmail.com> <4936C291.5090306@american.edu> <4936D914.4020100@astro.uni-bonn.de> <4936DA10.6020202@astro.uni-bonn.de> Message-ID: <62459D5C-D2D8-4325-B82F-65BF222B5F0B@gmail.com> Manuel, Looks nice, I gonna try to see how I can incorporate yours. Note that returning np.nan by default will not work w/ Python 2.6 if you want an int... From elfnor at gmail.com Wed Dec 3 19:06:28 2008 From: elfnor at gmail.com (Elfnor) Date: Wed, 3 Dec 2008 16:06:28 -0800 (PST) Subject: [Numpy-discussion] Apply a function to an array elementwise Message-ID: <20823768.post@talk.nabble.com> Hi I want to apply a function (myfunc which takes and returns a scalar) to each element in a multi-dimensioned array (data): I can do this: newdata = numpy.array([myfunc(d) for d in data.flat]).reshape(data.shape) But I'm wondering if there's a faster more numpy way. I've looked at the vectorize function but can't work it out. thanks Eleanor -- View this message in context: http://www.nabble.com/Apply-a-function-to-an-array-elementwise-tp20823768p20823768.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From oliphant at enthought.com Wed Dec 3 21:22:01 2008 From: oliphant at enthought.com (Travis Oliphant) Date: Wed, 03 Dec 2008 20:22:01 -0600 Subject: [Numpy-discussion] Apply a function to an array elementwise In-Reply-To: <20823768.post@talk.nabble.com> References: <20823768.post@talk.nabble.com> Message-ID: <49373EC9.2000501@enthought.com> Elfnor wrote: > Hi > > I want to apply a function (myfunc which takes and returns a scalar) to each > element in a multi-dimensioned array (data): > > I can do this: > > newdata = numpy.array([myfunc(d) for d in data.flat]).reshape(data.shape) > > But I'm wondering if there's a faster more numpy way. I've looked at the > vectorize function but can't work it out. > > from numpy import vectorize new_func = vectorize(myfunc) newdata = new_func(data) Should work. -Travis From cournape at gmail.com Wed Dec 3 22:19:17 2008 From: cournape at gmail.com (David Cournapeau) Date: Thu, 4 Dec 2008 12:19:17 +0900 Subject: [Numpy-discussion] Compiler options for mingw? In-Reply-To: <49322C6C.6070400@ar.media.kyoto-u.ac.jp> References: <96BABBCF-EF9D-4AF7-8BE4-03685EB080B2@yale.edu> <5b8d13220811281302n756a3b95ka1c6e7287cb23ae0@mail.gmail.com> <3AE60785-BEC5-4AC8-A914-E63A940225A9@yale.edu> <49322C6C.6070400@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220812031919y714ed1c7la1cec4eba12ee980@mail.gmail.com> On Sun, Nov 30, 2008 at 3:02 PM, David Cournapeau wrote: > > No at the moment, but you can easily decompress the .exe content to get > the internal .exe (which are straight installers built by python > setup.py setup.py bdist_wininst). It should be possible to force an > architecture at install time using a command line option, but I don't > have the time ATM to support this. I needed it to help me fixing a couple of bugs for old CPU, so it ended up being implemented in the nsis script for scipy now (I will add it to numpy installers too). So from now, any newly releases of both numpy and scipy installers could be overriden: installer-name.exe /arch native -> default behavior installer-name.exe /arch nosse -> Force installation wo sse, even if SSE-cpu is detected. It does not check that the option is valid, so you can end up requesting SSE3 installer on a SSE2 CPU. But well... David From erik.tollerud at gmail.com Thu Dec 4 03:20:56 2008 From: erik.tollerud at gmail.com (Erik Tollerud) Date: Thu, 4 Dec 2008 00:20:56 -0800 Subject: [Numpy-discussion] Py3k and numpy Message-ID: I noticed that the Python 3000 final was released today... is there any sense of how long it will take to get numpy working under 3k? I would imagine it'll be a lot to adapt given the low-level change, but is the work already in progress? From pgmdevlist at gmail.com Thu Dec 4 06:51:53 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 06:51:53 -0500 Subject: [Numpy-discussion] genloadtxt: second serving Message-ID: <842491BB-9946-4221-8646-57638104623C@gmail.com> All, Here's the second round of genloadtxt. That's a tad cleaner version than the previous one, where I tried to take into account the different comments and suggestions that were posted. So, tabs should be supported and explicit whitespaces are not collapsed. FYI, in the __main__ section, you'll find 2 hotshot tests and a timeit comparison: same input, no missing data, one with genloadtxt, one with np.loadtxt and a last one with matplotlib.mlab.csv2rec. As you'll see, genloadtxt is roughly twice slower than np.loadtxt, but twice faster than csv2rec. One of the explanation for the slowness is indeed the use of classes for splitting lines and converting values. Instead of a basic function, we use the __call__ method of the class, which itself calls another function depending on the attribute values. I'd like to reduce this overhead, any suggestion is more than welcome, as usual. Anyhow: as we do need speed, I suggest we put genloadtxt somewhere in numpy.ma, with an alias recfromcsv for John, using his defaults. Unless somebody comes with a brilliant optimization. Let me know how it goes, Cheers, P. -------------- next part -------------- A non-text attachment was scrubbed... Name: _preview.py Type: text/x-python-script Size: 31694 bytes Desc: not available URL: -------------- next part -------------- From pgmdevlist at gmail.com Thu Dec 4 06:52:32 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 06:52:32 -0500 Subject: [Numpy-discussion] genloadtxt: second serving (tests) Message-ID: <317DDE01-99E6-4C1E-9675-5299CAC39CF3@gmail.com> And now for the tests: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_preview.py Type: text/x-python-script Size: 15545 bytes Desc: not available URL: -------------- next part -------------- From mmetz at astro.uni-bonn.de Thu Dec 4 07:22:33 2008 From: mmetz at astro.uni-bonn.de (Manuel Metz) Date: Thu, 04 Dec 2008 13:22:33 +0100 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <842491BB-9946-4221-8646-57638104623C@gmail.com> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> Message-ID: <4937CB89.9070405@astro.uni-bonn.de> Pierre GM wrote: > All, > Here's the second round of genloadtxt. That's a tad cleaner version than > the previous one, where I tried to take into account the different > comments and suggestions that were posted. So, tabs should be supported > and explicit whitespaces are not collapsed. > FYI, in the __main__ section, you'll find 2 hotshot tests and a timeit > comparison: same input, no missing data, one with genloadtxt, one with > np.loadtxt and a last one with matplotlib.mlab.csv2rec. > > As you'll see, genloadtxt is roughly twice slower than np.loadtxt, but > twice faster than csv2rec. One of the explanation for the slowness is > indeed the use of classes for splitting lines and converting values. > Instead of a basic function, we use the __call__ method of the class, > which itself calls another function depending on the attribute values. > I'd like to reduce this overhead, any suggestion is more than welcome, > as usual. > > Anyhow: as we do need speed, I suggest we put genloadtxt somewhere in > numpy.ma, with an alias recfromcsv for John, using his defaults. Unless > somebody comes with a brilliant optimization. Will loadtxt in that case remain as is? Or will the _faulttolerantconv class be used? mm From olivier.grisel at ensta.org Thu Dec 4 10:26:37 2008 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 4 Dec 2008 16:26:37 +0100 Subject: [Numpy-discussion] Broadcasting question Message-ID: Hi list, Suppose I have array a with dimensions (d1, d3) and array b with dimensions (d2, d3). I want to compute array c with dimensions (d1, d2) holding the squared euclidian norms of vectors in a and b with size d3. My first take was to use a python level loop: >>> from numpy import * >>> c = array([sum((a_i - b) ** 2, axis=1) for a_i in a]) But this is too slow and allocate a useless temporary list of python references. To avoid the python level loop I then tried to use broadcasting as follows: >>> c = sum((a[:,newaxis,:] - b) ** 2, axis=2) But this build a useless and huge (d1, d2, d3) temporary array that does not fit in memory for large values of d1, d2 and d3... Do you have any better idea? I would like to simulate a runtime behavior similar to: >>> c = dot(a, b.T) but for for squared euclidian norms instead of dotproducts. I can always write a the code in C and wrap it with ctypes but I wondered whether this is possible only with numpy. -- Olivier From stefan at sun.ac.za Thu Dec 4 10:53:01 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 4 Dec 2008 17:53:01 +0200 Subject: [Numpy-discussion] Broadcasting question In-Reply-To: References: Message-ID: <9457e7c80812040753g1640ddc7l4f15965b55bc5973@mail.gmail.com> Hi Olivier 2008/12/4 Olivier Grisel : > To avoid the python level loop I then tried to use broadcasting as follows: > >>>> c = sum((a[:,newaxis,:] - b) ** 2, axis=2) > > But this build a useless and huge (d1, d2, d3) temporary array that > does not fit in memory for large values of d1, d2 and d3... Does numpy.lib.broadcast_arrays do what you need? Regards St?fan From zachary.pincus at yale.edu Thu Dec 4 11:24:23 2008 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 4 Dec 2008 11:24:23 -0500 Subject: [Numpy-discussion] Compiler options for mingw? In-Reply-To: <5b8d13220812031919y714ed1c7la1cec4eba12ee980@mail.gmail.com> References: <96BABBCF-EF9D-4AF7-8BE4-03685EB080B2@yale.edu> <5b8d13220811281302n756a3b95ka1c6e7287cb23ae0@mail.gmail.com> <3AE60785-BEC5-4AC8-A914-E63A940225A9@yale.edu> <49322C6C.6070400@ar.media.kyoto-u.ac.jp> <5b8d13220812031919y714ed1c7la1cec4eba12ee980@mail.gmail.com> Message-ID: <530E6E8F-5375-4320-A692-6F49DA7D9B6A@yale.edu> > I needed it to help me fixing a couple of bugs for old CPU, so it > ended up being implemented in the nsis script for scipy now (I will > add it to numpy installers too). So from now, any newly releases of > both numpy and scipy installers could be overriden: > > installer-name.exe /arch native -> default behavior > installer-name.exe /arch nosse -> Force installation wo sse, even if > SSE-cpu is detected. > > It does not check that the option is valid, so you can end up > requesting SSE3 installer on a SSE2 CPU. But well... Cool! Thanks! This will be really useful... Zach From charlesr.harris at gmail.com Thu Dec 4 11:39:24 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Dec 2008 09:39:24 -0700 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: Message-ID: On Thu, Dec 4, 2008 at 1:20 AM, Erik Tollerud wrote: > I noticed that the Python 3000 final was released today... is there > any sense of how long it will take to get numpy working under 3k? I > would imagine it'll be a lot to adapt given the low-level change, but > is the work already in progress? I read that announcement too. My feeling is that we can only support one branch at a time, i.e., the python 2.x or python 3.x series. So the easiest path to 3.x looked to be waiting until python 2.6 was widely distributed, making it the required version, doing the needed updates to numpy, and then using the automatic conversion to python 3.x. I expect f2py, nose, and other tools will also need fixups. Guido suggests an approach like this for those needing to support both series and I really don't see an alternative unless someone wants to fork numpy ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Dec 4 11:53:36 2008 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 4 Dec 2008 17:53:36 +0100 Subject: [Numpy-discussion] Broadcasting question In-Reply-To: <9457e7c80812040753g1640ddc7l4f15965b55bc5973@mail.gmail.com> References: <9457e7c80812040753g1640ddc7l4f15965b55bc5973@mail.gmail.com> Message-ID: 2008/12/4 St?fan van der Walt : > Hi Olivier > > 2008/12/4 Olivier Grisel : >> To avoid the python level loop I then tried to use broadcasting as follows: >> >>>>> c = sum((a[:,newaxis,:] - b) ** 2, axis=2) >> >> But this build a useless and huge (d1, d2, d3) temporary array that >> does not fit in memory for large values of d1, d2 and d3... > > Does numpy.lib.broadcast_arrays do what you need? That looks exactly what I am looking for. Apparently this is new in 1.2 since I cannot find it in the 1.1 version of my system. Thanks, -- Olivier From charlesr.harris at gmail.com Thu Dec 4 11:55:26 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Dec 2008 09:55:26 -0700 Subject: [Numpy-discussion] Broadcasting question In-Reply-To: References: Message-ID: On Thu, Dec 4, 2008 at 8:26 AM, Olivier Grisel wrote: > Hi list, > > Suppose I have array a with dimensions (d1, d3) and array b with > dimensions (d2, d3). I want to compute array c with dimensions (d1, > d2) holding the squared euclidian norms of vectors in a and b with > size d3. > Just to clarify the problem a bit, it looks like you want to compute the squared euclidean distance between every vector in a and every vector in b, i.e., a distance matrix. Is that correct? Also, how big are d1,d2,d3? If you *are* looking to compute the distance matrix I suspect your end goal is something beyond that. Could you describe what you are trying to do? I could be that scipy.spatial or scipy.cluster are what you should look at. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Thu Dec 4 12:58:52 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 12:58:52 -0500 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <4937CB89.9070405@astro.uni-bonn.de> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> <4937CB89.9070405@astro.uni-bonn.de> Message-ID: <272FAE9E-DC29-42A5-B44C-8EB16E05DF89@gmail.com> On Dec 4, 2008, at 7:22 AM, Manuel Metz wrote: > > Will loadtxt in that case remain as is? Or will the _faulttolerantconv > class be used? No idea, we need to discuss it. There's a problem with _faulttolerantconv: using np.nan as default value will not work in Python2.6 if the output is to be int, as an exception will be raised. Therefore, we'd need to change the default to something else when defining _faulttolerantconv. The easiest would be to define a class and set the argument at instantiation, but then we're going back dangerously close to StringConverter... From olivier.grisel at ensta.org Thu Dec 4 12:59:19 2008 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 4 Dec 2008 18:59:19 +0100 Subject: [Numpy-discussion] Broadcasting question In-Reply-To: References: Message-ID: 2008/12/4 Charles R Harris : > > > On Thu, Dec 4, 2008 at 8:26 AM, Olivier Grisel > wrote: >> >> Hi list, >> >> Suppose I have array a with dimensions (d1, d3) and array b with >> dimensions (d2, d3). I want to compute array c with dimensions (d1, >> d2) holding the squared euclidian norms of vectors in a and b with >> size d3. > > Just to clarify the problem a bit, it looks like you want to compute the > squared euclidean distance between every vector in a and every vector in b, > i.e., a distance matrix. Is that correct? Also, how big are d1,d2,d3? I would target d1 >> d2 ~ d3 with d1 as large as possible to fit in memory and d2 and d3 in the order of a couple hundreds or thousands for a start. > If you *are* looking to compute the distance matrix I suspect your end goal > is something beyond that. Could you describe what you are trying to do? My end goal it to compute the activation of an array of Radial Basis Function units where the activation of unit with center b_j for data vector a_i is given by: f(a_i, b_j) = exp(-||a_i - bj|| ** 2 / (2 * sigma)) The end goal is to have building blocks of various parameterized array of homogeneous units (linear, sigmoid and RBF) along with their gradient in parameter space so as too build various machine learning algorithms such as multi layer perceptrons with various training strategies such as Stochastic Gradient Descent. That code might be integrated into the Modular Data Processing (MPD toolkit) project [1] at some point. The current stat of the python code is here: http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdkernel/scalar.py You can find an SSE optimized C implementation wrapped with ctypes here: http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdkernel/sse.py http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdkernel/sse.c > It could be that scipy.spatial or scipy.cluster are what you should look at. I'll have a look at those, thanks for the pointer. [1] http://mdp-toolkit.sourceforge.net/ -- Olivier From charlesr.harris at gmail.com Thu Dec 4 13:57:23 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Dec 2008 11:57:23 -0700 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: Message-ID: On Thu, Dec 4, 2008 at 9:39 AM, Charles R Harris wrote: > > > On Thu, Dec 4, 2008 at 1:20 AM, Erik Tollerud wrote: > >> I noticed that the Python 3000 final was released today... is there >> any sense of how long it will take to get numpy working under 3k? I >> would imagine it'll be a lot to adapt given the low-level change, but >> is the work already in progress? > > > I read that announcement too. My feeling is that we can only support one > branch at a time, i.e., the python 2.x or python 3.x series. So the easiest > path to 3.x looked to be waiting until python 2.6 was widely distributed, > making it the required version, doing the needed updates to numpy, and then > using the automatic conversion to python 3.x. I expect f2py, nose, and other > tools will also need fixups. Guido suggests an approach like this for those > needing to support both series and I really don't see an alternative unless > someone wants to fork numpy ;) > Looks like python 2.6 just went into Fedora rawhide, so it should be in the May Fedora 11 release. I expect Ubuntu and other leading edge Linux distros to have it about the same time. This probably means numpy needs to be running on python 2.6 by early Spring. Dropping support for earlier versions of python might be something to look at for next Fall. So I'm guessing about a year will be the earliest we might have Python 3.0 support. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Dec 4 14:03:01 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 4 Dec 2008 13:03:01 -0600 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: Message-ID: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> On Thu, Dec 4, 2008 at 12:57, Charles R Harris wrote: > > > On Thu, Dec 4, 2008 at 9:39 AM, Charles R Harris > wrote: >> >> >> On Thu, Dec 4, 2008 at 1:20 AM, Erik Tollerud >> wrote: >>> >>> I noticed that the Python 3000 final was released today... is there >>> any sense of how long it will take to get numpy working under 3k? I >>> would imagine it'll be a lot to adapt given the low-level change, but >>> is the work already in progress? >> >> I read that announcement too. My feeling is that we can only support one >> branch at a time, i.e., the python 2.x or python 3.x series. So the easiest >> path to 3.x looked to be waiting until python 2.6 was widely distributed, >> making it the required version, doing the needed updates to numpy, and then >> using the automatic conversion to python 3.x. I expect f2py, nose, and other >> tools will also need fixups. Guido suggests an approach like this for those >> needing to support both series and I really don't see an alternative unless >> someone wants to fork numpy ;) > > Looks like python 2.6 just went into Fedora rawhide, so it should be in the > May Fedora 11 release. I expect Ubuntu and other leading edge Linux distros > to have it about the same time. This probably means numpy needs to be > running on python 2.6 by early Spring. It does. What problems are people seeing? Is it just the Windows build that causes people to say "numpy doesn't work with Python 2.6"? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at gmail.com Thu Dec 4 14:14:52 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 14:14:52 -0500 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> Message-ID: <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> On Nov 25, 2008, at 12:23 PM, Pierre GM wrote: > All, > Sorry to bump my own post, and I was kinda threadjacking anyway: > > Some functions of numy.ma (eg, ma.max, ma.min...) accept explicit > outputs that may not be MaskedArrays. > When such an explicit output is not a MaskedArray, a value that > should have been masked is transformed into np.nan. > > That worked great in 2.5, with np.nan automatically transformed to 0 > when the explicit output had a int dtype. With Python 2.6, a > ValueError is raised instead, as np.nan can no longer be casted to > int. > > What should be the recommended behavior in this case ? Raise a > ValueError or some other exception, to follow the new Python2.6 > convention, or silently replace np.nan by some value acceptable by > int dtype (0, or something else) ? Second bump, sorry. Any consensus on what the behavior should be ? Raise a ValueError (even in 2.5, therefore risking to break something) or just go with the flow and switch np.nan to an acceptable value (like 0), under the hood ? I'd like to close the corresponding ticket... From millman at berkeley.edu Thu Dec 4 14:40:56 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Thu, 4 Dec 2008 11:40:56 -0800 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> Message-ID: On Thu, Dec 4, 2008 at 11:14 AM, Pierre GM wrote: > Raise a ValueError (even in 2.5, therefore risking to break something) +1 -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From tgrav at mac.com Thu Dec 4 15:15:56 2008 From: tgrav at mac.com (Tommy Grav) Date: Thu, 04 Dec 2008 15:15:56 -0500 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: <63B78B1C-C6A4-4575-A242-463A932FCBE9@me.com> References: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> <63B78B1C-C6A4-4575-A242-463A932FCBE9@me.com> Message-ID: On Dec 4, 2008, at 2:03 PM, Robert Kern wrote: > It does. What problems are people seeing? Is it just the Windows build > that causes people to say "numpy doesn't work with Python 2.6"? There is currently no official Mac OSX binary for numpy for python 2.6, but you can build it from source. Is there any time table for generating a 2.6 Mac OS X binary? Cheers Tommy From josef.pktd at gmail.com Thu Dec 4 15:24:21 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Dec 2008 15:24:21 -0500 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> Message-ID: <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> On Thu, Dec 4, 2008 at 2:40 PM, Jarrod Millman wrote: > On Thu, Dec 4, 2008 at 11:14 AM, Pierre GM wrote: >> Raise a ValueError (even in 2.5, therefore risking to break something) > > +1 > +1 I'm not yet a serious user of numpy/scipy, but when debugging the discrete distributions, it took me a while to figure out that some mysteriously appearing zeros were nans that were silently converted during casting to int. In matlab, I encode different types of missing values (in the data) by numbers that I know are not in my dataset, e.g -2**20, -2**21,... but that depends on the dataset. (hand made nan handling, before data is cleaned). When I see then a "weird" number, I know that there is a problem, if it the nan is zero, I wouldn't know if it's a missing value or really a zero. Josef From millman at berkeley.edu Thu Dec 4 15:29:55 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Thu, 4 Dec 2008 12:29:55 -0800 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> <63B78B1C-C6A4-4575-A242-463A932FCBE9@me.com> Message-ID: On Thu, Dec 4, 2008 at 12:15 PM, Tommy Grav wrote: > On Dec 4, 2008, at 2:03 PM, Robert Kern wrote: >> It does. What problems are people seeing? Is it just the Windows build >> that causes people to say "numpy doesn't work with Python 2.6"? > > There is currently no official Mac OSX binary for numpy for python 2.6, > but you can build it from source. Is there any time table for generating > a 2.6 Mac OS X binary? My intention was to make 2.6 Mac binaries for the NumPy 1.3 release. We haven't finalized a timetable for the 1.3 release yet, but the current plan was to try and get the release out near the end of December. Once SciPy 0.7 is out, I will turn my attention to the next NumPy release. -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From pgmdevlist at gmail.com Thu Dec 4 15:27:15 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 4 Dec 2008 15:27:15 -0500 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> Message-ID: On Dec 4, 2008, at 3:24 PM, josef.pktd at gmail.com wrote: > On Thu, Dec 4, 2008 at 2:40 PM, Jarrod Millman > wrote: >> On Thu, Dec 4, 2008 at 11:14 AM, Pierre GM >> wrote: >>> Raise a ValueError (even in 2.5, therefore risking to break >>> something) >> >> +1 >> > > +1 OK then, I'll do that and update the SVN later tonight or early tmw... From rmay31 at gmail.com Thu Dec 4 15:54:28 2008 From: rmay31 at gmail.com (Ryan May) Date: Thu, 04 Dec 2008 14:54:28 -0600 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <842491BB-9946-4221-8646-57638104623C@gmail.com> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> Message-ID: <49384384.3000105@gmail.com> Pierre GM wrote: > All, > Here's the second round of genloadtxt. That's a tad cleaner version than > the previous one, where I tried to take into account the different > comments and suggestions that were posted. So, tabs should be supported > and explicit whitespaces are not collapsed. Looks pretty good, but there's one breakage against what I had working with my local copy (with mods). When adding the filtering of names read from the file using usecols, there's a reason I set a flag and fixed it later: converters specified by name. If we have usecols and converters specified by name, and we read the names from a file, we have the following sequence: 1) Read names 2) Convert usecols names to column numbers. 3) Filter name list using usecols. Indices of names list no longer map to column numbers. 4) Change converters from mapping names->funcs to mapping col#->func using indices from names....OOPS. It's an admittedly complex combination, but it allows flexibly reading text files since you're only basing on field names, no column numbers. Here's a test case: def test_autonames_usecols_and_converter(self): "Tests names and usecols" data = StringIO.StringIO('A B C D\n aaaa 121 45 9.1') test = loadtxt(data, usecols=('A', 'C', 'D'), names=True, dtype=None, converters={'C':lambda s: 2 * int(s)}) control = np.array(('aaaa', 90, 9.1), dtype=[('A', '|S4'), ('C', int), ('D', float)]) assert_equal(test, control) This fails with your current implementation, but works for me when: 1) Set a flag when reading names from header line in file 2) Filter names from file using usecols (if the flag is true) *after* remapping the converters. There may be a better approach, but this is the simplest I've come up with so far. > FYI, in the __main__ section, you'll find 2 hotshot tests and a timeit > comparison: same input, no missing data, one with genloadtxt, one with > np.loadtxt and a last one with matplotlib.mlab.csv2rec. > > As you'll see, genloadtxt is roughly twice slower than np.loadtxt, but > twice faster than csv2rec. One of the explanation for the slowness is > indeed the use of classes for splitting lines and converting values. > Instead of a basic function, we use the __call__ method of the class, > which itself calls another function depending on the attribute values. > I'd like to reduce this overhead, any suggestion is more than welcome, > as usual. > > Anyhow: as we do need speed, I suggest we put genloadtxt somewhere in > numpy.ma, with an alias recfromcsv for John, using his defaults. Unless > somebody comes with a brilliant optimization. Why only in numpy.ma and not somewhere in core numpy itself (missing values aside)? You have a pretty good masked array agnostic wrapper that IMO could go in numpy, though maybe not as loadtxt. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From Chris.Barker at noaa.gov Thu Dec 4 16:17:50 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 04 Dec 2008 13:17:50 -0800 Subject: [Numpy-discussion] in(np.nan) on python 2.6 In-Reply-To: <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> References: <0866E45B-F0D1-4076-8D91-697F0A5D99D3@gmail.com> <8DDAEA99-A33C-45D1-845D-0CF343C4DB62@gmail.com> <1cd32cbb0812041224n6d8fcc58l518854ca79d0f58@mail.gmail.com> Message-ID: <493848FE.4030101@noaa.gov> josef.pktd at gmail.com wrote: >>> Raise a ValueError (even in 2.5, therefore risking to break something) +1 as well > it took me a while to figure out that some > mysteriously appearing zeros were nans that were silently converted > during casting to int. and this is why -- a zero is a perfectly valid and useful number, NaN should never get cast to a zero (or any other valid number) unless the user explicitly asks it to be. I think the right choice was made for python 2.6 here. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From timmichelsen at gmx-topmail.de Thu Dec 4 17:19:04 2008 From: timmichelsen at gmx-topmail.de (Tim Michelsen) Date: Thu, 04 Dec 2008 23:19:04 +0100 Subject: [Numpy-discussion] Apply a function to an array elementwise In-Reply-To: <49373EC9.2000501@enthought.com> References: <20823768.post@talk.nabble.com> <49373EC9.2000501@enthought.com> Message-ID: >> I want to apply a function (myfunc which takes and returns a scalar) to each >> element in a multi-dimensioned array (data): >> >> I can do this: >> >> newdata = numpy.array([myfunc(d) for d in data.flat]).reshape(data.shape) >> >> But I'm wondering if there's a faster more numpy way. I've looked at the >> vectorize function but can't work it out. >> >> > > from numpy import vectorize > > new_func = vectorize(myfunc) > newdata = new_func(data) This seems be some sort of FAQ. Maybe the term vectorize is not known to all (newbie) users. At least finding its application in the docs doesn't seem easy. Here a more threads: * optimising single value functions for array calculations - http://article.gmane.org/gmane.comp.python.numeric.general/26543 * vectorized function inside a class - http://article.gmane.org/gmane.comp.python.numeric.general/16438 Most newcomers learn at some point to develop functions for single values (scalars) but to connect this with computation of full array and be efficient is another step. Some short note has been written on the cookbook: http://www.scipy.org/Cookbook/Autovectorize Regards, Timmie From kwmsmith at gmail.com Thu Dec 4 17:46:14 2008 From: kwmsmith at gmail.com (Kurt Smith) Date: Thu, 4 Dec 2008 16:46:14 -0600 Subject: [Numpy-discussion] PyArray_EMPTY and Cython In-Reply-To: <20081203035707.GA28913@encolpuis> References: <20081203035707.GA28913@encolpuis> Message-ID: On Tue, Dec 2, 2008 at 9:57 PM, Gabriel Gellner wrote: > After some discussion on the Cython lists I thought I would try my hand at > writing some Cython accelerators for empty and zeros. This will involve > using > PyArray_EMPTY, I have a simple prototype I would like to get working, but > currently it segfaults. Any tips on what I might be missing? I took a look at this, but I'm admittedly a cython newbie, but will be using code like this in the future. Have you had any luck? Kurt > > > import numpy as np > cimport numpy as np > > cdef extern from "numpy/arrayobject.h": > PyArray_EMPTY(int ndims, np.npy_intp* dims, int type, bint fortran) > > cdef np.ndarray empty(np.npy_intp length): > cdef np.ndarray[np.double_t, ndim=1] ret > cdef int type = np.NPY_DOUBLE > cdef int ndims = 1 > > cdef np.npy_intp* dims > dims = &length > > print dims[0] > print type > > ret = PyArray_EMPTY(ndims, dims, type, False) > > return ret > > def test(): > cdef np.ndarray[np.double_t, ndim=1] y = empty(10) > > return y > > > The code seems to print out the correct dims and type info but segfaults > when > the PyArray_EMPTY call is made. > > Thanks, > > Gabriel > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brennan.williams at visualreservoir.com Thu Dec 4 18:17:54 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 05 Dec 2008 12:17:54 +1300 Subject: [Numpy-discussion] checksum on numpy float array Message-ID: <49386522.70401@visualreservoir.com> My app reads in one or more float arrays from a binary file. Sometimes due to network timeouts etc the array is not read correctly. What would be the best way of checking the validity of the data? Would some sort of checksum approach be a good idea? Would that work with an array of floating point values? Or are checksums more for int,byte,string type data? From robert.kern at gmail.com Thu Dec 4 18:36:20 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 4 Dec 2008 17:36:20 -0600 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <49386522.70401@visualreservoir.com> References: <49386522.70401@visualreservoir.com> Message-ID: <3d375d730812041536s21627ae5xde71ff7d943c9740@mail.gmail.com> On Thu, Dec 4, 2008 at 17:17, Brennan Williams wrote: > My app reads in one or more float arrays from a binary file. > > Sometimes due to network timeouts etc the array is not read correctly. > > What would be the best way of checking the validity of the data? > > Would some sort of checksum approach be a good idea? > Would that work with an array of floating point values? > Or are checksums more for int,byte,string type data? Just use a generic hash on the file's bytes (ignoring their format). MD5 is sufficient for these purposes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Thu Dec 4 18:38:50 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Dec 2008 18:38:50 -0500 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <49386522.70401@visualreservoir.com> References: <49386522.70401@visualreservoir.com> Message-ID: <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> On Thu, Dec 4, 2008 at 6:17 PM, Brennan Williams wrote: > My app reads in one or more float arrays from a binary file. > > Sometimes due to network timeouts etc the array is not read correctly. > > What would be the best way of checking the validity of the data? > > Would some sort of checksum approach be a good idea? > Would that work with an array of floating point values? > Or are checksums more for int,byte,string type data? > If you want to verify the file itself, then python provides several more or less secure checksums, my experience was that zlib.crc32 was pretty fast on moderate file sizes. crc32 is common inside archive files and for binary newsgroups. If you have large files transported over the network, e.g. GB size, I would work with par2 repair files, which verifies and repairs at the same time. Josef From millman at berkeley.edu Thu Dec 4 18:41:40 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Thu, 4 Dec 2008 15:41:40 -0800 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <49384384.3000105@gmail.com> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> <49384384.3000105@gmail.com> Message-ID: I am not familiar with this, but it looks quite useful: http://www.stecf.org/software/PYTHONtools/astroasciidata/ or (http://www.scipy.org/AstroAsciiData) "Within the AstroAsciiData project we envision a module which can be used to work on all kinds of ASCII tables. The module provides a convenient tool such that the user easily can: * read in ASCII tables; * manipulate table elements; * save the modified ASCII table; * read and write meta data such as column names and units; * combine several tables; * delete/add rows and columns; * manage metadata in the table headers." Is anyone familiar with this package? Would make sense to investigate including this or adopting some of its interface/features? -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From brennan.williams at visualreservoir.com Thu Dec 4 18:43:58 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 05 Dec 2008 12:43:58 +1300 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> Message-ID: <49386B3E.2020207@visualreservoir.com> josef.pktd at gmail.com wrote: > On Thu, Dec 4, 2008 at 6:17 PM, Brennan Williams > wrote: > >> My app reads in one or more float arrays from a binary file. >> >> Sometimes due to network timeouts etc the array is not read correctly. >> >> What would be the best way of checking the validity of the data? >> >> Would some sort of checksum approach be a good idea? >> Would that work with an array of floating point values? >> Or are checksums more for int,byte,string type data? >> >> > > If you want to verify the file itself, then python provides several > more or less secure checksums, my experience was that zlib.crc32 was > pretty fast on moderate file sizes. crc32 is common inside archive > files and for binary newsgroups. If you have large files transported > over the network, e.g. GB size, I would work with par2 repair files, > which verifies and repairs at the same time. > > The file has multiple arrays stored in it. So I want to have some sort of validity check on just the array that I'm reading. I will need to add a check on the file as well as of course network problems could affect writing to the file as well as reading from the file. > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From cournape at gmail.com Thu Dec 4 18:45:42 2008 From: cournape at gmail.com (David Cournapeau) Date: Fri, 5 Dec 2008 08:45:42 +0900 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> References: <3d375d730812041103y130b60a9jc0dd016dc31a7e80@mail.gmail.com> Message-ID: <5b8d13220812041545u50886566n70861af77a2e4dca@mail.gmail.com> On Fri, Dec 5, 2008 at 4:03 AM, Robert Kern wrote: > On Thu, Dec 4, 2008 at 12:57, Charles R Harris > wrote: >> >> >> On Thu, Dec 4, 2008 at 9:39 AM, Charles R Harris >> wrote: >>> >>> >>> On Thu, Dec 4, 2008 at 1:20 AM, Erik Tollerud >>> wrote: >>>> >>>> I noticed that the Python 3000 final was released today... is there >>>> any sense of how long it will take to get numpy working under 3k? I >>>> would imagine it'll be a lot to adapt given the low-level change, but >>>> is the work already in progress? >>> >>> I read that announcement too. My feeling is that we can only support one >>> branch at a time, i.e., the python 2.x or python 3.x series. So the easiest >>> path to 3.x looked to be waiting until python 2.6 was widely distributed, >>> making it the required version, doing the needed updates to numpy, and then >>> using the automatic conversion to python 3.x. I expect f2py, nose, and other >>> tools will also need fixups. Guido suggests an approach like this for those >>> needing to support both series and I really don't see an alternative unless >>> someone wants to fork numpy ;) >> >> Looks like python 2.6 just went into Fedora rawhide, so it should be in the >> May Fedora 11 release. I expect Ubuntu and other leading edge Linux distros >> to have it about the same time. This probably means numpy needs to be >> running on python 2.6 by early Spring. > > It does. What problems are people seeing? Is it just the Windows build > that causes people to say "numpy doesn't work with Python 2.6"? Up to recently, numpy had some failures with python.org python 2.6 in x86 - but those are fixed now. The windows issues are mostly sorted out (and the missing information for reliable build has been integrated in python 2.6.1 I believe http://bugs.python.org/issue4365). F2py does not work, though - which is the main issue to make scipy work on 2.6, as far as I can see. David From robert.kern at gmail.com Thu Dec 4 18:52:42 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 4 Dec 2008 17:52:42 -0600 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <49386B3E.2020207@visualreservoir.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> Message-ID: <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> On Thu, Dec 4, 2008 at 17:43, Brennan Williams wrote: > josef.pktd at gmail.com wrote: >> On Thu, Dec 4, 2008 at 6:17 PM, Brennan Williams >> wrote: >> >>> My app reads in one or more float arrays from a binary file. >>> >>> Sometimes due to network timeouts etc the array is not read correctly. >>> >>> What would be the best way of checking the validity of the data? >>> >>> Would some sort of checksum approach be a good idea? >>> Would that work with an array of floating point values? >>> Or are checksums more for int,byte,string type data? >>> >>> >> >> If you want to verify the file itself, then python provides several >> more or less secure checksums, my experience was that zlib.crc32 was >> pretty fast on moderate file sizes. crc32 is common inside archive >> files and for binary newsgroups. If you have large files transported >> over the network, e.g. GB size, I would work with par2 repair files, >> which verifies and repairs at the same time. >> >> > The file has multiple arrays stored in it. > > So I want to have some sort of validity check on just the array that I'm > reading. So do it on the bytes of the individual arrays. Just don't bother implementing new type-specific checksums. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Thu Dec 4 18:57:24 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Dec 2008 18:57:24 -0500 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> Message-ID: <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> I didn't check what this does behind the scenes, but try this m = hashlib.md5() m.update(np.array(range(100))) m.update(np.array(range(200))) m2 = hashlib.md5() m2.update(np.array(range(100))) m2.update(np.array(range(200))) print m.hexdigest() print m2.hexdigest() assert m.hexdigest() == m2.hexdigest() m3 = hashlib.md5() m3.update(np.array(range(100))) m3.update(np.array(range(199))) print m3.hexdigest() assert m.hexdigest() == m3.hexdigest() Josef From josef.pktd at gmail.com Thu Dec 4 18:59:26 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Dec 2008 18:59:26 -0500 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> Message-ID: <1cd32cbb0812041559w478b621enfca53c7da36e4914@mail.gmail.com> On Thu, Dec 4, 2008 at 6:57 PM, wrote: > I didn't check what this does behind the scenes, but try this > I forgot to paste: import hashlib #standard python library Josef From brennan.williams at visualreservoir.com Thu Dec 4 19:54:44 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 05 Dec 2008 13:54:44 +1300 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> Message-ID: <49387BD4.3050501@visualreservoir.com> Thanks josef.pktd at gmail.com wrote: > I didn't check what this does behind the scenes, but try this > > import hashlib #standard python library import numpy as np > m = hashlib.md5() > m.update(np.array(range(100))) > m.update(np.array(range(200))) > > m2 = hashlib.md5() > m2.update(np.array(range(100))) > m2.update(np.array(range(200))) > > print m.hexdigest() > print m2.hexdigest() > > assert m.hexdigest() == m2.hexdigest() > > m3 = hashlib.md5() > m3.update(np.array(range(100))) > m3.update(np.array(range(199))) > > print m3.hexdigest() > > assert m.hexdigest() == m3.hexdigest() > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From robert.kern at gmail.com Thu Dec 4 20:11:20 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 4 Dec 2008 19:11:20 -0600 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <49387BD4.3050501@visualreservoir.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> <49387BD4.3050501@visualreservoir.com> Message-ID: <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> On Thu, Dec 4, 2008 at 18:54, Brennan Williams wrote: > Thanks > > josef.pktd at gmail.com wrote: >> I didn't check what this does behind the scenes, but try this >> >> > import hashlib #standard python library > import numpy as np >> m = hashlib.md5() >> m.update(np.array(range(100))) >> m.update(np.array(range(200))) I would recommend doing this on the strings before you make arrays from them. You don't know if the network cut out in the middle of an 8-byte double. Of course, sending the lengths and other metadata first, then the data would let you check without needing to do expensivish hashes or checksums. If truncation is your problem rather than corruption, then that would be sufficient. You may also consider using the NPY format in numpy 1.2 to implement that. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From brennan.williams at visualreservoir.com Thu Dec 4 21:29:08 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 05 Dec 2008 15:29:08 +1300 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> References: <49386522.70401@visualreservoir.com> <1cd32cbb0812041538q2bb3c927u9b09e26a2c7a09ad@mail.gmail.com> <49386B3E.2020207@visualreservoir.com> <3d375d730812041552j1d97a21xee82062d6de47efa@mail.gmail.com> <1cd32cbb0812041557w1fb18520ifa9ada87ae2bc193@mail.gmail.com> <49387BD4.3050501@visualreservoir.com> <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> Message-ID: <493891F4.4080001@visualreservoir.com> Robert Kern wrote: > On Thu, Dec 4, 2008 at 18:54, Brennan Williams > wrote: > >> Thanks >> >> josef.pktd at gmail.com wrote: >> >>> I didn't check what this does behind the scenes, but try this >>> >>> >>> >> import hashlib #standard python library >> import numpy as np >> >>> m = hashlib.md5() >>> m.update(np.array(range(100))) >>> m.update(np.array(range(200))) >>> > > I would recommend doing this on the strings before you make arrays > from them. You don't know if the network cut out in the middle of an > 8-byte double. > > Of course, sending the lengths and other metadata first, then the data > would let you check without needing to do expensivish hashes or > checksums. If truncation is your problem rather than corruption, then > that would be sufficient. You may also consider using the NPY format > in numpy 1.2 to implement that. > > Thanks for the ideas. I'm definitely going to add some more basic checks on lengths etc as well. Unfortunately the problem is happening at a client site so (a) I can't reproduce it and (b) most of the time they can't reproduce it either. This is a Windows Python app running on Citrix reading/writing data to a Linux networked drive. Brennan From dalcinl at gmail.com Thu Dec 4 21:31:37 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 4 Dec 2008 23:31:37 -0300 Subject: [Numpy-discussion] Py3k and numpy In-Reply-To: References: Message-ID: >From my experience working on my own projects and Cython: * the C code making Python C-API calls could be made to version-agnostic by using preprocessor macros, and even some compatibility header conditionally included. Perhaps the later would be the easiest for C-API calls (we have a lot already distilled in Cython sources). Preprocessor conditionals would still be needed when filling structs. * Regarding Python code, I believe the only sane way to go is to make the 2to3 tool to convert all the 2.x to 3.x code right. * The all-new buffer interface as implemented in core Py3.0 needs carefull review and fixes. * The now-all-strings-are-unicode is going to make some headaches ;-) * No idea how to deal with the now-all-integers-are-python-longs. On Thu, Dec 4, 2008 at 5:20 AM, Erik Tollerud wrote: > I noticed that the Python 3000 final was released today... is there > any sense of how long it will take to get numpy working under 3k? I > would imagine it'll be a lot to adapt given the low-level change, but > is the work already in progress? > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From mmetz at astro.uni-bonn.de Fri Dec 5 06:25:33 2008 From: mmetz at astro.uni-bonn.de (Manuel Metz) Date: Fri, 05 Dec 2008 12:25:33 +0100 Subject: [Numpy-discussion] genloadtxt: second serving In-Reply-To: <272FAE9E-DC29-42A5-B44C-8EB16E05DF89@gmail.com> References: <842491BB-9946-4221-8646-57638104623C@gmail.com> <4937CB89.9070405@astro.uni-bonn.de> <272FAE9E-DC29-42A5-B44C-8EB16E05DF89@gmail.com> Message-ID: <49390FAD.1020405@astro.uni-bonn.de> Pierre GM wrote: > On Dec 4, 2008, at 7:22 AM, Manuel Metz wrote: >> Will loadtxt in that case remain as is? Or will the _faulttolerantconv >> class be used? > > No idea, we need to discuss it. There's a problem with > _faulttolerantconv: using np.nan as default value will not work in > Python2.6 if the output is to be int, as an exception will be raised. Okay, that's something I did not check. If numpy.nan is converted to 0, it's basically useless -- 0 might be a valid number in the data and can not be distinguished from nan in that case. Here masked arrays is the only sensible approach. So the faulttolerantconv (ftc) class is applicable to floats and complex numbers only. It might nevertheless be useful to use the ftc class since (i) it results in almost no performance loss and (ii) at the same time you get at least a minimum fault tolerance, which can be very useful for many applications. I personally will switch to AstroAsciiData (thanks Jarrod for pointing this out), because that seems to be exactly what I need! Manuel > Therefore, we'd need to change the default to something else when > defining _faulttolerantconv. The easiest would be to define a class > and set the argument at instantiation, but then we're going back > dangerously close to StringConverter... From elcorto at gmx.net Fri Dec 5 10:10:41 2008 From: elcorto at gmx.net (Steve Schmerler) Date: Fri, 5 Dec 2008 16:10:41 +0100 Subject: [Numpy-discussion] subclassing ndarray Message-ID: <20081205151041.GA13217@ramrod.starsheriffs.de> Hi all I'm subclassing ndarray following [1] and I'd like to know if i'm doing it right. My goals are - ndarray subclass MyArray with additional methods - replacement for np.array, np.asarray on module level returning MyArray instances - expose new methods as functions on module level import numpy as np class MyArray(np.ndarray): def __new__(cls, arr, **kwargs): return np.asarray(arr, **kwargs).view(dtype=arr.dtype, type=cls) # define new methods here ... def print_shape(self): print self.shape # replace np.array() def array(*args, **kwargs): return MyArray(np.array(*args, **kwargs)) # replace np.asarray() def asarray(*args, **kwargs): return MyArray(*args, **kwargs) # expose array method as function def ps(a): asarray(a).print_shape() Would that work? PS: I found a little error in [1]: In section "__new__ and __init__", the class def should read class C(object): def __new__(cls, *args): + print 'cls is:", cls print 'Args in __new__:', args return object.__new__(cls, *args) def __init__(self, *args): + print 'self is:", self print 'Args in __init__:', args [1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html best, steve From faltet at pytables.org Fri Dec 5 12:42:00 2008 From: faltet at pytables.org (Francesc Alted) Date: Fri, 5 Dec 2008 18:42:00 +0100 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <493891F4.4080001@visualreservoir.com> References: <49386522.70401@visualreservoir.com> <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> <493891F4.4080001@visualreservoir.com> Message-ID: <200812051842.00471.faltet@pytables.org> A Friday 05 December 2008, Brennan Williams escrigu?: > Robert Kern wrote: > > On Thu, Dec 4, 2008 at 18:54, Brennan Williams > > > > wrote: > >> Thanks > >> > >> josef.pktd at gmail.com wrote: > >>> I didn't check what this does behind the scenes, but try this > >> > >> import hashlib #standard python library > >> import numpy as np > >> > >>> m = hashlib.md5() > >>> m.update(np.array(range(100))) > >>> m.update(np.array(range(200))) > > > > I would recommend doing this on the strings before you make arrays > > from them. You don't know if the network cut out in the middle of > > an 8-byte double. > > > > Of course, sending the lengths and other metadata first, then the > > data would let you check without needing to do expensivish hashes > > or checksums. If truncation is your problem rather than corruption, > > then that would be sufficient. You may also consider using the NPY > > format in numpy 1.2 to implement that. > > Thanks for the ideas. I'm definitely going to add some more basic > checks on lengths etc as well. > Unfortunately the problem is happening at a client site so (a) I > can't reproduce it and (b) most of the > time they can't reproduce it either. This is a Windows Python app > running on Citrix reading/writing data > to a Linux networked drive. Another possibility would be to use HDF5 as a data container. It supports the fletcher32 filter [1] which basically computes a chuksum for evey data chunk written to disk and then always check that the data read satifies the checksum kept on-disk. So, if the HDF5 layer doesn't complain, you are basically safe. There are at least two usable HDF5 interfaces for Python and NumPy: PyTables[2] and h5py [3]. PyTables does have support for that right out-of-the-box. Not sure about h5py though (a quick search in docs doesn't reveal nothing). [1] http://rfc.sunsite.dk/rfc/rfc1071.html [2] http://www.pytables.org [3] http://h5py.alfven.org Hope it helps, -- Francesc Alted From h5py at alfven.org Fri Dec 5 15:28:43 2008 From: h5py at alfven.org (Andrew Collette) Date: Fri, 05 Dec 2008 12:28:43 -0800 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <200812051842.00471.faltet@pytables.org> References: <49386522.70401@visualreservoir.com> <3d375d730812041711x22f44f1fga4949c1c305e1868@mail.gmail.com> <493891F4.4080001@visualreservoir.com> <200812051842.00471.faltet@pytables.org> Message-ID: <1228508923.7424.11.camel@tachyon-laptop> > Another possibility would be to use HDF5 as a data container. It > supports the fletcher32 filter [1] which basically computes a chuksum > for evey data chunk written to disk and then always check that the data > read satifies the checksum kept on-disk. So, if the HDF5 layer doesn't > complain, you are basically safe. > > There are at least two usable HDF5 interfaces for Python and NumPy: > PyTables[2] and h5py [3]. PyTables does have support for that right > out-of-the-box. Not sure about h5py though (a quick search in docs > doesn't reveal nothing). > > [1] http://rfc.sunsite.dk/rfc/rfc1071.html > [2] http://www.pytables.org > [3] http://h5py.alfven.org > > Hope it helps, > Just to confirm that h5py does in fact have fletcher32; it's one of the options you can specify when creating a dataset, although it could use better documentation: http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create_dataset Like other checksums, fletcher32 provides error-detection but not error-correction. You'll still need to throw away data which can't be read. However, I believe that you can still read sections of the dataset which aren't corrupted. Andrew Collette From pgmdevlist at gmail.com Fri Dec 5 18:59:25 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 5 Dec 2008 18:59:25 -0500 Subject: [Numpy-discussion] genloadtxt : last call Message-ID: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> All, Here's the latest version of genloadtxt, with some recent corrections. With just a couple of tweaking, we end up with some decent speed: it's still slower than np.loadtxt, but only 15% so according to the test at the end of the package. And so, now what ? Should I put the module in numpy.lib.io ? Elsewhere ? Thx for any comment and suggestions. -------------- next part -------------- A non-text attachment was scrubbed... Name: _preview.py Type: text/x-python-script Size: 32751 bytes Desc: not available URL: -------------- next part -------------- From dsdale24 at gmail.com Sat Dec 6 00:17:58 2008 From: dsdale24 at gmail.com (Darren Dale) Date: Sat, 6 Dec 2008 00:17:58 -0500 Subject: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing In-Reply-To: References: <492D8FD3.8050601@enthought.com> <492DC9B0.1030300@gmail.com> <5b8d13220811301944k7807d3a2w4fcc821255269053@mail.gmail.com> <20081201081220.GC18450@phare.normalesup.org> Message-ID: On Mon, Dec 1, 2008 at 10:30 AM, Darren Dale wrote: > > > On Mon, Dec 1, 2008 at 3:12 AM, Gael Varoquaux < > gael.varoquaux at normalesup.org> wrote: > >> On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote: >> > On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale wrote: >> > > I tried installing 4.0.300x on a machine running 64-bit windows vista >> home >> > > edition and ran into problems with PyQt and some related packages. So >> I >> > > uninstalled all the python-related software, EPD took over 30 minutes >> to >> > > uninstall, and tried to install EPD 4.1 beta. >> >> > My guess is that EPD is only 32 bits installer, so that you run it on >> > WOW (Windows in Windows) on windows 64, which is kind of slow (but >> > usable for most tasks). >> >> On top of that, Vista is not supported with EPD. I had a chat with the >> EPD guys about that, and they say it does work with Vista... most of the >> time. They don't really understand the failures, and haven't had time to >> investigate much, because so far professionals and labs are simply >> avoiding Vista. Hopefully someone from the EPD team will give a more >> accurate answer >> soon. > > > Thanks Gael and David. I would avoid windows altogether if I could. When I > bought a new laptop I had the option to pay extra to downgrade to XP pro, I > should have done some more research before I settled for Vista. In the > meantime I'll borrow an XP machine when I need to build python package > installers for windows. > > Hopefully a solution can be found at some point for python and Vista. > Losing compatibility on such a major platform will become increasingly > problematic. > I just wanted to follow up, it looks like the Vista installation issues have been ironed out with the release of python-2.6.1. I was able to install 32-bit python-2.6.1 from the msi file distributed at python.org in a straight-forward manner, no need to mess around with user account controls or other such nonsense. I even have setuptools working with python 2.6, I built and installed a setuptools msi without much trouble (distutils just doesnt like setuptools version numbering). One pleasant surprise: python-2.6 is built with visual C++ 2008, which has a free express edition available so building python extension modules might be a little more convenient than it was in the past. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Sat Dec 6 05:41:16 2008 From: faltet at pytables.org (Francesc Alted) Date: Sat, 6 Dec 2008 11:41:16 +0100 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <1228508923.7424.11.camel@tachyon-laptop> References: <49386522.70401@visualreservoir.com> <200812051842.00471.faltet@pytables.org> <1228508923.7424.11.camel@tachyon-laptop> Message-ID: <200812061141.16983.faltet@pytables.org> A Friday 05 December 2008, Andrew Collette escrigu?: > > Another possibility would be to use HDF5 as a data container. It > > supports the fletcher32 filter [1] which basically computes a > > chuksum for evey data chunk written to disk and then always check > > that the data read satifies the checksum kept on-disk. So, if the > > HDF5 layer doesn't complain, you are basically safe. > > > > There are at least two usable HDF5 interfaces for Python and NumPy: > > PyTables[2] and h5py [3]. PyTables does have support for that > > right out-of-the-box. Not sure about h5py though (a quick search > > in docs doesn't reveal nothing). > > > > [1] http://rfc.sunsite.dk/rfc/rfc1071.html > > [2] http://www.pytables.org > > [3] http://h5py.alfven.org > > > > Hope it helps, > > Just to confirm that h5py does in fact have fletcher32; it's one of > the options you can specify when creating a dataset, although it > could use better documentation: > > http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create >_dataset My bad. I've searched for 'fletcher' instead of 'fletcher32'. I naively thought that the search tool in Sphinx allowed for partial name finding. In fact, it is a pity it does not. Cheers, -- Francesc Alted From gael.varoquaux at normalesup.org Sat Dec 6 06:35:16 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 6 Dec 2008 12:35:16 +0100 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> Message-ID: <20081206113516.GC12839@phare.normalesup.org> On Fri, Dec 05, 2008 at 06:59:25PM -0500, Pierre GM wrote: > Here's the latest version of genloadtxt, with some recent corrections. With > just a couple of tweaking, we end up with some decent speed: it's still > slower than np.loadtxt, but only 15% so according to the test at the end of > the package. 15% slow-down is acceptable, IMHO. There is fromfile for the fast and well understood usecase. Thanks for doing all this work. Ga?l From brennan.williams at visualreservoir.com Sat Dec 6 20:15:25 2008 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Sun, 07 Dec 2008 14:15:25 +1300 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <200812061141.16983.faltet@pytables.org> References: <49386522.70401@visualreservoir.com> <200812051842.00471.faltet@pytables.org> <1228508923.7424.11.camel@tachyon-laptop> <200812061141.16983.faltet@pytables.org> Message-ID: <493B23AD.6000300@visualreservoir.com> OK so maybe I should.... (1) not add some sort of checksum type functionality to my read/write methods these read/write methods simply read/write numpy arrays to a binary file which contains one or more numpy arrays (and nothing else). (2) replace my binary files iwith either HDF5 or PyTables But.... my app is being used by clients on existing projects - in one case there are over 900 of these numpy binary files in just one project, albeit each file is pretty small (200KB or so) so.. questions..... How can I tranparently (or at least with minimum user-pain) replace my existing read/write methods with PyTables or HDF5? My initial thoughts are... (a) have an app version number and a data format version number which i can check against. (b) if data format version < 1.0 then read from old binary files (c) if app version number > 1.0 then write to new PyTables or HDF5 files (d) get clients to open existing project and then save existing project to semi-transparently convert from old to new formats. Francesc Alted wrote: > A Friday 05 December 2008, Andrew Collette escrigu?: > >>> Another possibility would be to use HDF5 as a data container. It >>> supports the fletcher32 filter [1] which basically computes a >>> chuksum for evey data chunk written to disk and then always check >>> that the data read satifies the checksum kept on-disk. So, if the >>> HDF5 layer doesn't complain, you are basically safe. >>> >>> There are at least two usable HDF5 interfaces for Python and NumPy: >>> PyTables[2] and h5py [3]. PyTables does have support for that >>> right out-of-the-box. Not sure about h5py though (a quick search >>> in docs doesn't reveal nothing). >>> >>> [1] http://rfc.sunsite.dk/rfc/rfc1071.html >>> [2] http://www.pytables.org >>> [3] http://h5py.alfven.org >>> >>> Hope it helps, >>> >> Just to confirm that h5py does in fact have fletcher32; it's one of >> the options you can specify when creating a dataset, although it >> could use better documentation: >> >> http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create >> _dataset >> > > My bad. I've searched for 'fletcher' instead of 'fletcher32'. I > naively thought that the search tool in Sphinx allowed for partial name > finding. In fact, it is a pity it does not. > > Cheers, > > From pgmdevlist at gmail.com Sun Dec 7 15:02:53 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 7 Dec 2008 15:02:53 -0500 Subject: [Numpy-discussion] Python2.4 support Message-ID: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> All, * What versions of Python should be supported by what version of numpy ? Are we to expect users to rely on Python2.5 for the upcoming 1.3.x ? Could we have some kind of timeline on the trac site or elsewhere (and if such a timeline exists already, can I get the link?) ? * Talking about 1.3.x, what's the timeline? Are we still shooting for a release in 2008 or could we wait till mid Jan. 2009 ? Thx a lot in advance From millman at berkeley.edu Sun Dec 7 16:21:53 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Sun, 7 Dec 2008 13:21:53 -0800 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> Message-ID: On Sun, Dec 7, 2008 at 12:02 PM, Pierre GM wrote: > * What versions of Python should be supported by what version of > numpy ? Are we to expect users to rely on Python2.5 for the upcoming > 1.3.x ? Could we have some kind of timeline on the trac site or > elsewhere (and if such a timeline exists already, can I get the link?) ? NumPy 1.3.x should work with Python 2.4, 2.5, and 2.6. At some point we can drop 2.4, but I would like to wait a bit since we just dropped 2.3 support. The timeline is on the trac site: http://projects.scipy.org/scipy/numpy/milestone/1.3.0 > * Talking about 1.3.x, what's the timeline? Are we still shooting for > a release in 2008 or could we wait till mid Jan. 2009 ? I am fine with pushing the release back, if there is interest in doing that. I have been mainly focusing on getting SciPy 0.7.x out, so I haven't been following the NumPy development closely. But it is good that you are asking for more concrete details about the next NumPy release. We need to start making plans. Does anyone have any suggestions about whether we should push the release back? Is 1 month long enough? What is left to do? Please feel free to update the release notes, which are checked into the trunk: http://scipy.org/scipy/numpy/browser/trunk/doc/release/1.3.0-notes.rst Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From pgmdevlist at gmail.com Sun Dec 7 16:34:31 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 7 Dec 2008 16:34:31 -0500 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> Message-ID: <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> On Dec 7, 2008, at 4:21 PM, Jarrod Millman wrote: > NumPy 1.3.x should work with Python 2.4, 2.5, and 2.6. At some point > we can drop 2.4, but I would like to wait a bit since we just dropped > 2.3 support. The timeline is on the trac site: > http://projects.scipy.org/scipy/numpy/milestone/1.3.0 OK, great, thanks a lot. >> * Talking about 1.3.x, what's the timeline? Are we still shooting for >> a release in 2008 or could we wait till mid Jan. 2009 ? > > I am fine with pushing the release back, if there is interest in doing > that. I have been mainly focusing on getting SciPy 0.7.x out, so I > haven't been following the NumPy development closely. But it is good > that you are asking for more concrete details about the next NumPy > release. We need to start making plans. Does anyone have any > suggestions about whether we should push the release back? Is 1 month > long enough? What is left to do? Well, on my side, there's some doc to be updated, of course. Then, I'd like to put the rec_functions that were developed in matplotlib to manipulate recordarrays. I haven't started yet, might be able to do so before the end of the year (not much to do, just a clean up and some examples). And what should we do with the genloadtxt function ? > > Please feel free to update the release notes, which are checked into > the trunk: > http://scipy.org/scipy/numpy/browser/trunk/doc/release/1.3.0- > notes.rst > Will do in good time. Thx again From david at ar.media.kyoto-u.ac.jp Mon Dec 8 00:42:53 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 08 Dec 2008 14:42:53 +0900 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> Message-ID: <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Jarrod Millman wrote: > On Sun, Dec 7, 2008 at 12:02 PM, Pierre GM wrote: > >> * What versions of Python should be supported by what version of >> numpy ? Are we to expect users to rely on Python2.5 for the upcoming >> 1.3.x ? Could we have some kind of timeline on the trac site or >> elsewhere (and if such a timeline exists already, can I get the link?) ? >> > > NumPy 1.3.x should work with Python 2.4, 2.5, and 2.6. At some point > we can drop 2.4, but I would like to wait a bit since we just dropped > 2.3 support. The timeline is on the trac site: > http://projects.scipy.org/scipy/numpy/milestone/1.3.0 > I am strongly against dropping 2.4 support anytime soon. I haven't seen a strong rationale for using >= 2.5 features in numpy, supporting 2.4 is not so hard, and 2.4 is still the default python version on many OS (mac os X 10.4 I believe, RHEL for sure, open solaris). David From millman at berkeley.edu Mon Dec 8 01:49:54 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Sun, 7 Dec 2008 22:49:54 -0800 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Message-ID: On Sun, Dec 7, 2008 at 9:42 PM, David Cournapeau wrote: > I am strongly against dropping 2.4 support anytime soon. I haven't seen > a strong rationale for using >= 2.5 features in numpy, supporting 2.4 is > not so hard, and 2.4 is still the default python version on many OS (mac > os X 10.4 I believe, RHEL for sure, open solaris). While my feelings aren't as strong as David's, they are pretty much identical. As a point of reference, Red Hat Enterprise Linux 6 won't come out until at least the first quarter of 2010. Until then we should make a serious effort to support Python 2.4, which ships with RHEL 5. It looks like RHEL 6 will be based on the upcoming Fedora 11 release, which will ship with Python 2.6. That gives us a minimum of one year for 2.4 support. Once RHEL 6 is released, it will take several months before a sizable number of users upgrade. Moin has a detailed list of Python versions for various OSes and hosting services: http://moinmo.in/PollAboutRequiringPython24 -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From james at fmnmedia.co.uk Mon Dec 8 06:35:33 2008 From: james at fmnmedia.co.uk (James) Date: Mon, 08 Dec 2008 11:35:33 +0000 Subject: [Numpy-discussion] Line of best fit! Message-ID: <493D0685.5050903@fmnmedia.co.uk> Hi, I am trying to plot a line of best fit for some data i have, is there a simple way of doing it? Cheers From scott.sinclair.za at gmail.com Mon Dec 8 07:47:05 2008 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 8 Dec 2008 14:47:05 +0200 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: <493D0685.5050903@fmnmedia.co.uk> References: <493D0685.5050903@fmnmedia.co.uk> Message-ID: <6a17e9ee0812080447n7dc9495jc51fccda1ed8828a@mail.gmail.com> > 2008/12/8 James : > I am trying to plot a line of best fit for some data i have, is there a > simple way of doing it? Hi James, Take a look at: http://www.scipy.org/Cookbook/FittingData http://www.scipy.org/Cookbook/LinearRegression and the section on least square fitting towards the end of this page in the Scipy docs: http://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html Post again if if these references don't get you going. Cheers, Scott From ezindy at gmail.com Mon Dec 8 07:55:02 2008 From: ezindy at gmail.com (Egor Zindy) Date: Mon, 08 Dec 2008 12:55:02 +0000 Subject: [Numpy-discussion] ANN: numpy.i - added managed deallocation to ARGOUTVIEW_ARRAY1 (ARGOUTVIEWM_ARRAY1) In-Reply-To: <492538B9.10202@gmail.com> References: <491F8F4A.30009@gmail.com> <49231156.1060209@gmail.com> <49231EB3.8060802@noaa.gov> <492538B9.10202@gmail.com> Message-ID: <493D1926.4010506@gmail.com> Hello list, just a quick follow-up on the managed deallocation. This is what I've done this week-end: In numpy.i, I have redefined the import_array() function to also take care of the managed memory initialisation (the _MyDeallocType.tp_new = PyType_GenericNew; statement). This means that in %init(), the only call is to import_array(). Basically, the same as with the "normal" numpy.i. Only difference in a swig file (.i) between "unmanaged" and "managed" memory allocation is the use of either the ARGOUTVIEW_ARRAY or ARGOUTVIEWM_ARRAY fragments. Everything else is hidden. In numpy.i, this is what's now happening (my previous attempts were a bit clumsy): %#undef import_array %#define import_array() {if (_import_array() < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); return; }; _MyDeallocType.tp_new = PyType_GenericNew; if (PyType_Ready(&_MyDeallocType) < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "Custom memory management failed to initialize (numpy.i)"); return; } } %#undef import_array1 %#define import_array1(ret) {if (_import_array() < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); return ret; }; _MyDeallocType.tp_new = PyType_GenericNew; if (PyType_Ready(&_MyDeallocType) < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "Custom memory management failed to initialize (numpy.i)"); return ret; } } %#undef import_array2 %#define import_array2(msg, ret) {if (_import_array() < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, msg); return ret; }; _MyDeallocType.tp_new = PyType_GenericNew; if (PyType_Ready(&_MyDeallocType) < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, msg); return ret; } } My wiki (sorry, haven't moved it to the scipy cookbook yet) has all the details (the modified numpy.i, explanations, and some test code): http://code.google.com/p/ezwidgets/wiki/NumpyManagedMemory Regards, Egor From ramercer at gmail.com Mon Dec 8 09:33:21 2008 From: ramercer at gmail.com (Adam Mercer) Date: Mon, 8 Dec 2008 08:33:21 -0600 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Message-ID: <799406d60812080633h55e16d4fl2a084721b96314bc@mail.gmail.com> On Sun, Dec 7, 2008 at 23:42, David Cournapeau wrote: > I am strongly against dropping 2.4 support anytime soon. I haven't seen > a strong rationale for using >= 2.5 features in numpy, supporting 2.4 is > not so hard, and 2.4 is still the default python version on many OS (mac > os X 10.4 I believe, RHEL for sure, open solaris). Mac OS X 10.4 uses python-2.3, 10.5 uses python-2.5. Cheers Adam From matthieu.brucher at gmail.com Mon Dec 8 09:40:03 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 8 Dec 2008 15:40:03 +0100 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Message-ID: > While my feelings aren't as strong as David's, they are pretty much identical. > > As a point of reference, Red Hat Enterprise Linux 6 won't come out > until at least the first quarter of 2010. Until then we should make a > serious effort to support Python 2.4, which ships with RHEL 5. It > looks like RHEL 6 will be based on the upcoming Fedora 11 release, > which will ship with Python 2.6. That gives us a minimum of one year > for 2.4 support. Once RHEL 6 is released, it will take several months > before a sizable number of users upgrade. > > Moin has a detailed list of Python versions for various OSes and > hosting services: > http://moinmo.in/PollAboutRequiringPython24 At least several months, if not years. RedHat supports each version 7 years, for instance (I don't ask for that long). Currently, I'm still using a RHEL 4, although it is planned to migrate to RHEL 5 next year. So we should still support 2.4 for at least 18 months, in case some big firms use RHEL and Python+Numpy for their tools. -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From gwg at emss.co.za Mon Dec 8 10:50:14 2008 From: gwg at emss.co.za (George Goussard) Date: Mon, 8 Dec 2008 17:50:14 +0200 Subject: [Numpy-discussion] Singular Matrix problem with Matplitlib in Numpy (Windows - AMD64) Message-ID: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> Hello. I have been battling with the following error for the past week. The output from the terminal is: Traceback (most recent call last): File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\backends\backend_qt4agg.py", line 86, in paintEvent FigureCanvasAgg.draw(self) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\backends\backend_agg.py", line 261, in draw self.figure.draw(self.renderer) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\figure.py", line 765, in draw legend.draw(renderer) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\legend.py", line 197, in draw self._update_positions(renderer) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\legend.py", line 513, in _update_positions l,b,w,h = get_tbounds(self.texts[-1]) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\legend.py", line 499, in get_tbounds bboxa = bbox.inverse_transformed(self.get_transform()) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\transforms.py", line 478, in inverse_transformed return Bbox(transform.inverted().transform(self.get_points())) File "C:\development\Python\2_5_2\Lib\site-packages\matplotlib\transforms.py", line 1338, in inverted self._inverted = Affine2D(inv(mtx)) File "C:\development\Python\2_5_2\Lib\site-packages\numpy\linalg\linalg.py", line 350, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) File "C:\development\Python\2_5_2\Lib\site-packages\numpy\linalg\linalg.py", line 249, in solve raise LinAlgError, 'Singular matrix' numpy.linalg.linalg.LinAlgError: Singular matrix Initially MPL plots a graph but when you try to interact with the widget(for example resize) then the output is displayed and the MPL figure is not updated. Everything works with Windows 32-bit. Linux 32-bit and 64-bit are working correctly. Any ideas would be helpful. Thanks. George. -------------- next part -------------- An HTML attachment was scrubbed... URL: From f.yw at hotmail.com Mon Dec 8 12:27:01 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 8 Dec 2008 10:27:01 -0700 Subject: [Numpy-discussion] how to create a matrix based on a vector? In-Reply-To: <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> Message-ID: Hi, I want to create a matrix based on a vector. It is difficult to describe the issue for me in english. Here is an example. Suppose I have an array([3, 6, 8, 12]), I want to create a range based on each element. In this exampe, let us say want to create 4 number with step 2, so I will have [3, 6, 8, 12 5, 8, 10,14 7, 10,12,16 9, 12,14,18] It is a 4 by 4 maxtric in this example. My original array is quite large. but the range I want to create around the number is not big, it is about 30. Does anyone know how to do this efficiently? Thanks Frank _________________________________________________________________ Send e-mail faster without improving your typing skills. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_speed_122008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Dec 8 12:30:31 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 8 Dec 2008 11:30:31 -0600 Subject: [Numpy-discussion] how to create a matrix based on a vector? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> Message-ID: <3d375d730812080930v2abb508fwbaaaea43dac143d7@mail.gmail.com> On Mon, Dec 8, 2008 at 11:27, frank wang wrote: > Hi, > > I want to create a matrix based on a vector. It is difficult to describe the > issue for me in english. Here is an example. > > Suppose I have an array([3, 6, 8, 12]), I want to create a range based on > each element. In this exampe, let us say want to create 4 number with step > 2, so I will have > > [3, 6, 8, 12 > 5, 8, 10,14 > 7, 10,12,16 > 9, 12,14,18] > > It is a 4 by 4 maxtric in this example. My original array is quite large. > but the range I want to create around the number is not big, it is about 30. > > Does anyone know how to do this efficiently? In [1]: from numpy import * In [2]: a = array([3, 6, 8, 12]) In [4]: b = arange(0, 4*2, 2)[:,newaxis] In [5]: a+b Out[5]: array([[ 3, 6, 8, 12], [ 5, 8, 10, 14], [ 7, 10, 12, 16], [ 9, 12, 14, 18]]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Mon Dec 8 12:43:25 2008 From: cournape at gmail.com (David Cournapeau) Date: Tue, 9 Dec 2008 02:43:25 +0900 Subject: [Numpy-discussion] Singular Matrix problem with Matplitlib in Numpy (Windows - AMD64) In-Reply-To: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> References: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> Message-ID: <5b8d13220812080943g69d4c670jabd6aef66d336e29@mail.gmail.com> On Tue, Dec 9, 2008 at 12:50 AM, George Goussard wrote: > Hello. > > > > I have been battling with the following error for the past week. The output > from the terminal is: > What does numpy.test() says ? Did you use an external blas/lapack when you built numpy for AMD64 David From faltet at pytables.org Mon Dec 8 13:01:36 2008 From: faltet at pytables.org (Francesc Alted) Date: Mon, 8 Dec 2008 19:01:36 +0100 Subject: [Numpy-discussion] checksum on numpy float array In-Reply-To: <493B23AD.6000300@visualreservoir.com> References: <49386522.70401@visualreservoir.com> <200812061141.16983.faltet@pytables.org> <493B23AD.6000300@visualreservoir.com> Message-ID: <200812081901.37137.faltet@pytables.org> A Sunday 07 December 2008, Brennan Williams escrigu?: > OK so maybe I should.... > > (1) not add some sort of checksum type functionality to my read/write > methods > > these read/write methods simply read/write numpy arrays to a > binary file which contains one or more numpy arrays (and nothing > else). > > (2) replace my binary files iwith either HDF5 or PyTables > > But.... > > my app is being used by clients on existing projects - in one case > there are over 900 of these numpy binary files in just one project, > albeit each file is pretty small (200KB or so) > > so.. questions..... > > How can I tranparently (or at least with minimum user-pain) replace > my existing read/write methods with PyTables or HDF5? > > My initial thoughts are... > > (a) have an app version number and a data format version number which > i can check against. > > (b) if data format version < 1.0 then read from old binary files > > (c) if app version number > 1.0 then write to new PyTables or HDF5 > files > > (d) get clients to open existing project and then save existing > project to semi-transparently convert from old to new formats. Yeah. That would work perfectly. Also, there is a function in PyTables named 'isHDF5File(filename)' that allow you to know whether a file is in HDF5 format or not. You might want to use it and avoid to bother with data format/app version issues. Cheers, Francesc > > Francesc Alted wrote: > > A Friday 05 December 2008, Andrew Collette escrigu?: > >>> Another possibility would be to use HDF5 as a data container. It > >>> supports the fletcher32 filter [1] which basically computes a > >>> chuksum for evey data chunk written to disk and then always check > >>> that the data read satifies the checksum kept on-disk. So, if > >>> the HDF5 layer doesn't complain, you are basically safe. > >>> > >>> There are at least two usable HDF5 interfaces for Python and > >>> NumPy: PyTables[2] and h5py [3]. PyTables does have support for > >>> that right out-of-the-box. Not sure about h5py though (a quick > >>> search in docs doesn't reveal nothing). > >>> > >>> [1] http://rfc.sunsite.dk/rfc/rfc1071.html > >>> [2] http://www.pytables.org > >>> [3] http://h5py.alfven.org > >>> > >>> Hope it helps, > >> > >> Just to confirm that h5py does in fact have fletcher32; it's one > >> of the options you can specify when creating a dataset, although > >> it could use better documentation: > >> > >> http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.cre > >>ate _dataset > > > > My bad. I've searched for 'fletcher' instead of 'fletcher32'. I > > naively thought that the search tool in Sphinx allowed for partial > > name finding. In fact, it is a pity it does not. > > > > Cheers, > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted From f.yw at hotmail.com Mon Dec 8 13:40:01 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 8 Dec 2008 11:40:01 -0700 Subject: [Numpy-discussion] how to create a matrix based on a vector? In-Reply-To: <3d375d730812080930v2abb508fwbaaaea43dac143d7@mail.gmail.com> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> <3d375d730812080930v2abb508fwbaaaea43dac143d7@mail.gmail.com> Message-ID: I got a lof of help from the experts in this forum. I resitsted to send a thank you reply for fearing spaming the forum. This time I really want to let the people know that I am really appreciate the great help I got. Please let me know if a simple thank you message is not appropriate in this forum. Numpy makes Pyhton a great tools for processing signal. Thank you very much. Frank > Date: Mon, 8 Dec 2008 11:30:31 -0600> From: robert.kern at gmail.com> To: numpy-discussion at scipy.org> Subject: Re: [Numpy-discussion] how to create a matrix based on a vector?> > On Mon, Dec 8, 2008 at 11:27, frank wang wrote:> > Hi,> >> > I want to create a matrix based on a vector. It is difficult to describe the> > issue for me in english. Here is an example.> >> > Suppose I have an array([3, 6, 8, 12]), I want to create a range based on> > each element. In this exampe, let us say want to create 4 number with step> > 2, so I will have> >> > [3, 6, 8, 12> > 5, 8, 10,14> > 7, 10,12,16> > 9, 12,14,18]> >> > It is a 4 by 4 maxtric in this example. My original array is quite large.> > but the range I want to create around the number is not big, it is about 30.> >> > Does anyone know how to do this efficiently?> > In [1]: from numpy import *> > In [2]: a = array([3, 6, 8, 12])> > In [4]: b = arange(0, 4*2, 2)[:,newaxis]> > In [5]: a+b> Out[5]:> array([[ 3, 6, 8, 12],> [ 5, 8, 10, 14],> [ 7, 10, 12, 16],> [ 9, 12, 14, 18]])> > -- > Robert Kern> > "I have come to believe that the whole world is an enigma, a harmless> enigma that is made terrible by our own mad attempt to interpret it as> though it had an underlying truth."> -- Umberto Eco> _______________________________________________> Numpy-discussion mailing list> Numpy-discussion at scipy.org> http://projects.scipy.org/mailman/listinfo/numpy-discussion _________________________________________________________________ Send e-mail faster without improving your typing skills. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_speed_122008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Dec 8 14:37:24 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 8 Dec 2008 13:37:24 -0600 Subject: [Numpy-discussion] how to create a matrix based on a vector? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> <3d375d730812080930v2abb508fwbaaaea43dac143d7@mail.gmail.com> Message-ID: <3d375d730812081137k1a6b4ca2g1b37a05001101cf5@mail.gmail.com> On Mon, Dec 8, 2008 at 12:40, frank wang wrote: > I got a lof of help from the experts in this forum. I resitsted to send a > thank you reply for fearing spaming the forum. This time I really want to > let the people know that I am really appreciate the great help I got. > > Please let me know if a simple thank you message is not appropriate in this > forum. Thanks, public or otherwise, are always appreciated. You're quite welcome. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From james at fmnmedia.co.uk Mon Dec 8 15:32:48 2008 From: james at fmnmedia.co.uk (James) Date: Mon, 08 Dec 2008 20:32:48 +0000 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: <493D0685.5050903@fmnmedia.co.uk> References: <493D0685.5050903@fmnmedia.co.uk> Message-ID: <493D8470.8000303@fmnmedia.co.uk> I have a very simple plot, and the lines join point to point, however i would like to add a line of best fit now onto the chart, i am really new to python etc, and didnt really understand those links! Can anyone help me :) Cheers! James wrote: > Hi, > > I am trying to plot a line of best fit for some data i have, is there a > simple way of doing it? > > Cheers > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From lou_boog2000 at yahoo.com Mon Dec 8 15:54:18 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 8 Dec 2008 12:54:18 -0800 (PST) Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <493D8470.8000303@fmnmedia.co.uk> Message-ID: <966547.42601.qm@web34402.mail.mud.yahoo.com> In looking for simple ways to read and write data (in a text readable format) to and from a file and later restoring the actual data when reading back in, I've found that numpy arrays don't seem to play well with repr and eval. E.g. to write some data (mixed types) to a file I can do this (fp is an open file), thedata=[3.0,-4.9+2.0j,'another string'] repvars= repr(thedata)+"\n" fp.write(repvars) Then to read it back and restore the data each to its original type, strvars= fp.readline() sonofdata= eval(strvars) which gives back the original data list. BUT when I try this with numpy arrays in the data list I find that repr of an array adds extra end-of-lines and that messes up the simple restoration of the data using eval. Am I missing something simple? I know I've seen people recommend ways to save arrays to files, but I'm wondering what is the most straight-forward? I really like the simple, pythonic approach of the repr - eval pairing. Thanks for any advice. (yes, I am googling, too) -- Lou Pecora, my views are my own. From matthieu.brucher at gmail.com Mon Dec 8 15:56:40 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 8 Dec 2008 21:56:40 +0100 Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <966547.42601.qm@web34402.mail.mud.yahoo.com> References: <493D8470.8000303@fmnmedia.co.uk> <966547.42601.qm@web34402.mail.mud.yahoo.com> Message-ID: Hi, The repr - eval pair does not work with numpy. You can simply do a tofile() from file(). Matthieu 2008/12/8 Lou Pecora : > In looking for simple ways to read and write data (in a text readable format) to and from a file and later restoring the actual data when reading back in, I've found that numpy arrays don't seem to play well with repr and eval. > > E.g. to write some data (mixed types) to a file I can do this (fp is an open file), > > thedata=[3.0,-4.9+2.0j,'another string'] > repvars= repr(thedata)+"\n" > fp.write(repvars) > > Then to read it back and restore the data each to its original type, > > strvars= fp.readline() > sonofdata= eval(strvars) > > which gives back the original data list. > > BUT when I try this with numpy arrays in the data list I find that repr of an array adds extra end-of-lines and that messes up the simple restoration of the data using eval. > > Am I missing something simple? I know I've seen people recommend ways to save arrays to files, but I'm wondering what is the most straight-forward? I really like the simple, pythonic approach of the repr - eval pairing. > > Thanks for any advice. (yes, I am googling, too) > > > -- Lou Pecora, my views are my own. > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From robert.kern at gmail.com Mon Dec 8 16:15:41 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 8 Dec 2008 15:15:41 -0600 Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <966547.42601.qm@web34402.mail.mud.yahoo.com> References: <493D8470.8000303@fmnmedia.co.uk> <966547.42601.qm@web34402.mail.mud.yahoo.com> Message-ID: <3d375d730812081315g26e14706s8ca2faa94cf75f58@mail.gmail.com> On Mon, Dec 8, 2008 at 14:54, Lou Pecora wrote: > In looking for simple ways to read and write data (in a text readable format) to and from a file and later restoring the actual data when reading back in, I've found that numpy arrays don't seem to play well with repr and eval. > > E.g. to write some data (mixed types) to a file I can do this (fp is an open file), > > thedata=[3.0,-4.9+2.0j,'another string'] > repvars= repr(thedata)+"\n" > fp.write(repvars) > > Then to read it back and restore the data each to its original type, > > strvars= fp.readline() > sonofdata= eval(strvars) > > which gives back the original data list. > > BUT when I try this with numpy arrays in the data list I find that repr of an array adds extra end-of-lines and that messes up the simple restoration of the data using eval. I don't see any extra end-of-lines. Are you sure you aren't talking about the "..." when you are saving large arrays? You will need to use set_printoptions() to disable that (threshold=sys.maxint). You should also adjust use precision=18, suppress=False. That should mostly work, but it's never a certain thing. > Am I missing something simple? I know I've seen people recommend ways to save arrays to files, but I'm wondering what is the most straight-forward? I really like the simple, pythonic approach of the repr - eval pairing. > > Thanks for any advice. (yes, I am googling, too) The most bulletproof way would be to use numpy.save() and numpy.load(), but this is a binary format, not a text one. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lou_boog2000 at yahoo.com Mon Dec 8 16:24:27 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 8 Dec 2008 13:24:27 -0800 (PST) Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: Message-ID: <775651.80835.qm@web34401.mail.mud.yahoo.com> --- On Mon, 12/8/08, Matthieu Brucher wrote: > From: Matthieu Brucher > Subject: Re: [Numpy-discussion] What to use to read and write numpy arrays to a file? > To: "Discussion of Numerical Python" > Date: Monday, December 8, 2008, 3:56 PM > Hi, > > The repr - eval pair does not work with numpy. You can > simply do a > tofile() from file(). > > Matthieu Yes, I found the tofile/fromfile pair, but they don't preserve the shape. Sorry, I should have been clearer on that in my request. I will be saving arrays whose shape I may not know later when I read them in. I'd like that information to be preserved. Thanks. -- Lou Pecora, my views are my own. From lou_boog2000 at yahoo.com Mon Dec 8 16:26:20 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Mon, 8 Dec 2008 13:26:20 -0800 (PST) Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <3d375d730812081315g26e14706s8ca2faa94cf75f58@mail.gmail.com> Message-ID: <37053.3395.qm@web34406.mail.mud.yahoo.com> --- On Mon, 12/8/08, Robert Kern wrote: > From: Robert Kern > Subject: Re: [Numpy-discussion] What to use to read and write numpy arrays to a file? > > The most bulletproof way would be to use numpy.save() and > numpy.load(), but this is a binary format, not a text one. > > -- > Robert Kern > Thanks, Robert. I may have to go that route, assuming that the save and load pair preserve shape, i.e. I don't have to know the shape when I read back in. -- Lou Pecora, my views are my own. From robert.kern at gmail.com Mon Dec 8 16:28:14 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 8 Dec 2008 15:28:14 -0600 Subject: [Numpy-discussion] What to use to read and write numpy arrays to a file? In-Reply-To: <37053.3395.qm@web34406.mail.mud.yahoo.com> References: <3d375d730812081315g26e14706s8ca2faa94cf75f58@mail.gmail.com> <37053.3395.qm@web34406.mail.mud.yahoo.com> Message-ID: <3d375d730812081328i27e624f9gd181efbd5625b3c0@mail.gmail.com> On Mon, Dec 8, 2008 at 15:26, Lou Pecora wrote: > --- On Mon, 12/8/08, Robert Kern wrote: > >> From: Robert Kern >> Subject: Re: [Numpy-discussion] What to use to read and write numpy arrays to a file? >> >> The most bulletproof way would be to use numpy.save() and >> numpy.load(), but this is a binary format, not a text one. >> >> -- >> Robert Kern >> > > Thanks, Robert. I may have to go that route, assuming that the save and load pair preserve shape, i.e. I don't have to know the shape when I read back in. They do. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From amcmorl at gmail.com Mon Dec 8 19:00:16 2008 From: amcmorl at gmail.com (Angus McMorland) Date: Mon, 8 Dec 2008 19:00:16 -0500 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: <493D8470.8000303@fmnmedia.co.uk> References: <493D0685.5050903@fmnmedia.co.uk> <493D8470.8000303@fmnmedia.co.uk> Message-ID: Hi James, 2008/12/8 James : > > I have a very simple plot, and the lines join point to point, however i > would like to add a line of best fit now onto the chart, i am really new > to python etc, and didnt really understand those links! > > Can anyone help me :) It sounds like the second link, about linear regression, is a good place to start, and I've made a very simple example based on that: ----------------------------------------------- import numpy as np import matplotlib.pyplot as plt x = np.linspace(0, 10, 11) #1 data_y = np.random.normal(size=x.shape, loc=x, scale=2.5) #2 plt.plot(x, data_y, 'bo') #3 coefs = np.lib.polyfit(x, data_y, 1) #4 fit_y = np.lib.polyval(coefs, x) #5 plt.plot(x, fit_y, 'b--') #6 ------------------------------------------------ Line 1 creates an array with the x values I have. Line 2 creates some random "data" I want to fit, which, in this case happens to be normally distributed around the unity line y=x. The raw data is plotted (assuming you have matplotlib installed as well - I suggest you do) by line 3, with blue circles. Line 4 calculates the coefficients giving the least-squares best fit to a first degree polynomial (i.e. a straight line y = c0 * x + c1). So the values of coefs are c0 and c1 in the previous equation. Line 5 calculates the y values on the fitted polynomial, at given x values, from the coefficients calculated in line 4, and line 6 simply plots these fitted y values, using a dotted blue line. I hope that helps get you started. Keep posting questions on specific issues as they arise, and we'll see what we can do to help. Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh From steve at shrogers.com Mon Dec 8 19:07:00 2008 From: steve at shrogers.com (Steven H. Rogers) Date: Mon, 08 Dec 2008 17:07:00 -0700 Subject: [Numpy-discussion] Python2.4 support In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <493CB3DD.8090602@ar.media.kyoto-u.ac.jp> Message-ID: <493DB6A4.1050501@shrogers.com> Matthieu Brucher wrote: > At least several months, if not years. RedHat supports each version 7 > years, for instance (I don't ask for that long). > Currently, I'm still using a RHEL 4, although it is planned to migrate > to RHEL 5 next year. So we should still support 2.4 for at least 18 > months, in case some big firms use RHEL and Python+Numpy for their > tools. > +1 From f.yw at hotmail.com Mon Dec 8 20:15:26 2008 From: f.yw at hotmail.com (frank wang) Date: Mon, 8 Dec 2008 18:15:26 -0700 Subject: [Numpy-discussion] how do I delete unused matrix to save the memory? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> Message-ID: Hi, I have a program with some variables consume a lot of memory. The first time I run it, it is fine. The second time I run it, I will get MemoryError. If I close the ipython and reopen it again, then I can run the program once. I am looking for a command to delete the intermediate variable once it is not used to save memory like in matlab clear command. Thanks Frank _________________________________________________________________ Send e-mail faster without improving your typing skills. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_speed_122008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at enthought.com Mon Dec 8 22:00:57 2008 From: travis at enthought.com (Travis Vaught) Date: Mon, 8 Dec 2008 21:00:57 -0600 Subject: [Numpy-discussion] how do I delete unused matrix to save the memory? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> Message-ID: Try: del(myvariable) Travis On Dec 8, 2008, at 7:15 PM, frank wang wrote: > Hi, > > I have a program with some variables consume a lot of memory. The > first time I run it, it is fine. The second time I run it, I will > get MemoryError. If I close the ipython and reopen it again, then I > can run the program once. I am looking for a command to delete the > intermediate variable once it is not used to save memory like in > matlab clear command. > > Thanks > > Frank > > Send e-mail faster without improving your typing skills. Get your > Hotmail? account. > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From Garry.Willgoose at newcastle.edu.au Mon Dec 8 19:02:46 2008 From: Garry.Willgoose at newcastle.edu.au (Garry Willgoose) Date: Tue, 9 Dec 2008 11:02:46 +1100 Subject: [Numpy-discussion] when will osx linker option -bundle be reflected in distutils Message-ID: I was just wondering what plans there were to reflect the different linker options (i.e. -bundle instead of -shared) that are required on OSX in the fcompiler files within distutils. While its a minor thing it always catches the users of my software when they either install fresh or update numpy ... and sometimes on a bad day it even catches me ;-) ==================================================================== Prof Garry Willgoose, Australian Professorial Fellow in Environmental Engineering, Director, Centre for Climate Impact Management (C2IM), School of Engineering, The University of Newcastle, Callaghan, 2308 Australia. Centre webpage: www.c2im.org.au Phone: (International) +61 2 4921 6050 (Tues-Fri AM); +61 2 6545 9574 (Fri PM-Mon) FAX: (International) +61 2 4921 6991 (Uni); +61 2 6545 9574 (personal and Telluric) Env. Engg. Secretary: (International) +61 2 4921 6042 email: garry.willgoose at newcastle.edu.au; g.willgoose at telluricresearch.com email-for-life: garry.willgoose at alum.mit.edu personal webpage: www.telluricresearch.com/garry ==================================================================== "Do not go where the path may lead, go instead where there is no path and leave a trail" Ralph Waldo Emerson ==================================================================== From robert.kern at gmail.com Mon Dec 8 22:19:20 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 8 Dec 2008 21:19:20 -0600 Subject: [Numpy-discussion] when will osx linker option -bundle be reflected in distutils In-Reply-To: References: Message-ID: <3d375d730812081919x3fc8dcf4rbc38c7ebae654d7f@mail.gmail.com> On Mon, Dec 8, 2008 at 18:02, Garry Willgoose wrote: > I was just wondering what plans there were to reflect the different > linker options (i.e. -bundle instead of -shared) that are required on > OSX in the fcompiler files within distutils. While its a minor thing > it always catches the users of my software when they either install > fresh or update numpy ... and sometimes on a bad day it even catches > me ;-) I'm sorry; I don't follow. What problems are you having? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From scott.sinclair.za at gmail.com Tue Dec 9 00:13:26 2008 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 9 Dec 2008 07:13:26 +0200 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: References: <493D0685.5050903@fmnmedia.co.uk> <493D8470.8000303@fmnmedia.co.uk> Message-ID: <6a17e9ee0812082113o36d5195flac27b558aec87fa9@mail.gmail.com> > 2008/12/9 Angus McMorland : > Hi James, > > 2008/12/8 James : >> >> I have a very simple plot, and the lines join point to point, however i >> would like to add a line of best fit now onto the chart, i am really new >> to python etc, and didnt really understand those links! >> >> Can anyone help me :) > > It sounds like the second link, about linear regression, is a good > place to start, and I've made a very simple example based on that: > > ----------------------------------------------- > import numpy as np > import matplotlib.pyplot as plt > > x = np.linspace(0, 10, 11) #1 > data_y = np.random.normal(size=x.shape, loc=x, scale=2.5) #2 > plt.plot(x, data_y, 'bo') #3 > > coefs = np.lib.polyfit(x, data_y, 1) #4 > fit_y = np.lib.polyval(coefs, x) #5 > plt.plot(x, fit_y, 'b--') #6 > ------------------------------------------------ James, you'll want to add an extra line to the above code snippet so that Matplotlib displays the plot: plt.show() Cheers, Scott From rmay31 at gmail.com Tue Dec 9 00:39:19 2008 From: rmay31 at gmail.com (Ryan May) Date: Mon, 08 Dec 2008 23:39:19 -0600 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> Message-ID: <493E0487.50909@gmail.com> Pierre GM wrote: > All, > Here's the latest version of genloadtxt, with some recent corrections. > With just a couple of tweaking, we end up with some decent speed: it's > still slower than np.loadtxt, but only 15% so according to the test at > the end of the package. > > And so, now what ? Should I put the module in numpy.lib.io ? Elsewhere ? > > Thx for any comment and suggestions. Current version works out of the box for me. Thanks for running point on this. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From millman at berkeley.edu Tue Dec 9 04:34:29 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Tue, 9 Dec 2008 01:34:29 -0800 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> Message-ID: On Fri, Dec 5, 2008 at 3:59 PM, Pierre GM wrote: > All, > Here's the latest version of genloadtxt, with some recent corrections. With > just a couple of tweaking, we end up with some decent speed: it's still > slower than np.loadtxt, but only 15% so according to the test at the end of > the package. > > And so, now what ? Should I put the module in numpy.lib.io ? Elsewhere ? Thanks for working on this. I think that having simple, easy-to-use, flexible, and fast IO code is extremely important; so I really appreciate this work. I have a few general comments about the IO code and where I would like to see it going: Where should IO code go? ------------------------------------ >From the user's perspective, I would like all the NumPy IO code to be in the same place in NumPy; and all the SciPy IO code to be in the same place in SciPy. So, for instance, the user shouldn't get `mloadtxt` from `numpy.ma.io`. Another way of saying this is that in IPython, I should be able to see all NumPy IO functions by tab-completing once. Slightly less important to me is that I would like to be able to do: from numpy import io as npio from scipy import io as spio What is the difference between NumPy and SciPy IO? ------------------------------------------------------------------------ It was decided last year that numpy io should provide simple, generic, core io functionality. While scipy io would provide more domain- or application-specific io code (e.g., Matlab IO, WAV IO, etc.) My vision for scipy io, which I know isn't shared, is to be more or less aiming to be all inclusive (e.g., all image, sound, and data formats). (That is a different discussion; just wanted it to be clear where I stand.) For numpy io, it should include: - generic helper routines for data io (i.e., datasource, etc.) - a standard, supported binary format (i.e., npy/npz) - generic ascii file support (i.e, loadtxt, etc.) What about AstroAsciiData? ------------------------------------- I sent an email asking about AstroAsciiData last week. The only response I got was from Manuel Metz saying that he was switching to AstroAsciiData since it did exactly what he needed. In my mind, I would prefer that numpy io had the best ascii data handling. So I wonder if it would make sense to incorporate AstroAsciiData? As far as I know, it is pure Python with a BSD license. Maybe the authors would be willing to help integrate the code and continue maintaining it in numpy. If others are supportive of this general approach, I would be happy to approach them. It is possible that we won't want all their functionality, but it would be good to avoid duplicating effort. I realize that this may not be persuasive to everyone, but I really feel that IO code is special and that it is an area where numpy/scipy should devote some effort at consolidating the community on some standard packages and approaches. 3. What about data source? On a related note, I wanted to point out datasource. Data source is a file interface for handling local and remote data files: http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/lib/_datasource.py It was originally developed by Jonathan Taylor and then modified by Brian Hawthorne and Chris Burns. It is fairly well-documented and tested, so it would be easier to take a look at it than or me to reexplain it here. The basic idea is to have a drop-in replacement for file handling, which would abstract away whether the file was remote or local, compressed or not, etc. The hope was that it would allow us to simplify support for remote file access and handling compressed files by merely using a datasource instead of a filename: def loadtxt(fname .... vs. def loadtxt(datasource .... I would appreciate hearing whether this seems doable or useful. Should we remove datasource? Start using it more? Does it need to be slightly or dramatically improved/overhauled? Renamed `datafile` or paired with a `datadestination`? Support versioning/checksumming/provenance tracking (a tad ambitious;))? Is anyone interested in picking up where we left off and improving it? Thoughts? Suggestions? Documentation --------------------- The main reason that I am so interested in the IO code is that it seems like it is one of the first areas that users will look. ("I have heard about this Python for scientific programming thing and I wonder what all the fuss is about? Let me try NumPy; this seems pretty good. Now let's see how to load in some of my data....") I just took a quick look through the documentation and I couldn't find any in the User Guide and this is the main IO page in the reference manual: http://docs.scipy.org/doc/numpy/reference/routines.io.html I would like to see a section on data IO in the user guide and have a more prominent mention of IO code in the reference manual (i.e., http://docs.scipy.org/doc/numpy/reference/io.html ?). Unfortunately, I don't have time to help out; but since it looks like there has been some recent activity in this area I thought I'd mention it. As always--thanks to everyone who is actually putting in hard work! Sorry I am not offering to actually help out here, but I hope that someone will be interested and able to pursue some of these issues. Thanks again, Jarrod On Thu, Dec 4, 2008 at 3:41 PM, Jarrod Millman wrote: > I am not familiar with this, but it looks quite useful: > http://www.stecf.org/software/PYTHONtools/astroasciidata/ > or (http://www.scipy.org/AstroAsciiData) > > "Within the AstroAsciiData project we envision a module which can be > used to work on all kinds of ASCII tables. The module provides a > convenient tool such that the user easily can: > > * read in ASCII tables; > * manipulate table elements; > * save the modified ASCII table; > * read and write meta data such as column names and units; > * combine several tables; > * delete/add rows and columns; > * manage metadata in the table headers." > > Is anyone familiar with this package? Would make sense to investigate > including this or adopting some of its interface/features? From millman at berkeley.edu Tue Dec 9 05:12:42 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Tue, 9 Dec 2008 02:12:42 -0800 Subject: [Numpy-discussion] Please help prepare the SciPy 0.7 release notes Message-ID: We are almost ready for SciPy 0.7.0rc1 (we just need to sort out the Numerical Recipes issues and I haven't had time to look though them yet). So I wanted to ask once more for help with preparing the release notes: http://projects.scipy.org/scipy/scipy/browser/trunk/doc/release/0.7.0-notes.rst There have been numerous improvements and changes. As always I would appreciate any feedback about mistakes or omissions. It would also be nice to know how many tests were in the last release and how many are there now. Highlighting major bug fixes or pointing out know issues would be very useful. I would also like to ask if anyone would be interested in stepping forward to work on something like Andrew Kuchling's "What's New in Python ....": http://docs.python.org/whatsnew/2.6.html This would be a great area to contribute. The release notes provide visibility for our developers' immense contributions of time and effort. They help provide an atmosphere of momentum, maturity, and excitement to a project. It is also a great service to users who haven't been following the trunk closely as well as other developer's who have missed what is happening in other areas of the code. It is also becomes a nice historical artifact for the future. It would be great if someone wanted to contribute in this way. Ideally, I would like to have someone who be interested in doing this for several releases of scipy and numpy. Such a person could develop a standard template for this and write some scripts to gather specific statistics (e.g., how many lines of code have changed, how many unit tests were added, what is the test coverage, what is the docstring coverage, who were the top contributors, who has increased their code contributions the most, how many new developers, etc.) Just a thought. Figure it won't happen, if I don't ask. Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From james at fmnmedia.co.uk Tue Dec 9 06:13:24 2008 From: james at fmnmedia.co.uk (James) Date: Tue, 09 Dec 2008 11:13:24 +0000 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: <6a17e9ee0812082113o36d5195flac27b558aec87fa9@mail.gmail.com> References: <493D0685.5050903@fmnmedia.co.uk> <493D8470.8000303@fmnmedia.co.uk> <6a17e9ee0812082113o36d5195flac27b558aec87fa9@mail.gmail.com> Message-ID: <493E52D4.4020408@fmnmedia.co.uk> Hi, Thanks for all your help so far! Right i think it would be easier to just show you the chart i have so far; -------------------------- import numpy as np import matplotlib.pyplot as plt plt.plot([4,8,12,16,20,24], [0.008,0.016,0.021,0.038,0.062,0.116], 'bo') plt.xlabel("F (Number of washers)") plt.ylabel("v^2/r ms-2") plt.title("Circular Motion") plt.axis([2,26,0,0.120]) plt.show() ------------------------ Very basic i know, all i wish to do is add a line of best fit based on that data, in the examples there seems to be far more variables, do i need to split my data up etc? Thanks Scott Sinclair wrote: >> 2008/12/9 Angus McMorland : >> Hi James, >> >> 2008/12/8 James : >> >>> I have a very simple plot, and the lines join point to point, however i >>> would like to add a line of best fit now onto the chart, i am really new >>> to python etc, and didnt really understand those links! >>> >>> Can anyone help me :) >>> >> It sounds like the second link, about linear regression, is a good >> place to start, and I've made a very simple example based on that: >> >> ----------------------------------------------- >> import numpy as np >> import matplotlib.pyplot as plt >> >> x = np.linspace(0, 10, 11) #1 >> data_y = np.random.normal(size=x.shape, loc=x, scale=2.5) #2 >> plt.plot(x, data_y, 'bo') #3 >> >> coefs = np.lib.polyfit(x, data_y, 1) #4 >> fit_y = np.lib.polyval(coefs, x) #5 >> plt.plot(x, fit_y, 'b--') #6 >> ------------------------------------------------ >> > > James, you'll want to add an extra line to the above code snippet so > that Matplotlib displays the plot: > > plt.show() > > Cheers, > Scott > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > From lbrooks at MIT.EDU Tue Dec 9 06:35:11 2008 From: lbrooks at MIT.EDU (Lane Brooks) Date: Tue, 09 Dec 2008 04:35:11 -0700 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: <493E52D4.4020408@fmnmedia.co.uk> References: <493D0685.5050903@fmnmedia.co.uk> <493D8470.8000303@fmnmedia.co.uk> <6a17e9ee0812082113o36d5195flac27b558aec87fa9@mail.gmail.com> <493E52D4.4020408@fmnmedia.co.uk> Message-ID: <493E57EF.2060601@mit.edu> James wrote: > Hi, > > Thanks for all your help so far! > > Right i think it would be easier to just show you the chart i have so far; > > -------------------------- > import numpy as np > import matplotlib.pyplot as plt > > plt.plot([4,8,12,16,20,24], [0.008,0.016,0.021,0.038,0.062,0.116], 'bo') > > plt.xlabel("F (Number of washers)") > plt.ylabel("v^2/r ms-2") > plt.title("Circular Motion") > plt.axis([2,26,0,0.120]) > > plt.show() > > ------------------------ > > Very basic i know, all i wish to do is add a line of best fit based on > that data, in the examples there seems to be far more variables, do i > need to split my data up etc? > Here is how I would do it: import numpy as np import matplotlib.pyplot as plt x = np.array([4,8,12,16,20,24]) y = np.array([0.008,0.016,0.021,0.038,0.062,0.116]) m = np.polyfit(x, y, 1) yfit = np.polyval(m, x) plt.plot(x, y, 'bo', x, yfit, 'k') plt.xlabel("F (Number of washers)") plt.ylabel("v2/r ms-2") plt.title("Circular Motion") plt.axis([2,26,0,0.120]) plt.text(5, 0.06, "Slope=%f" % m[0]) plt.text(5, 0.05, "Offset=%f" % m[1]) plt.show() From hanni.ali at gmail.com Tue Dec 9 09:07:34 2008 From: hanni.ali at gmail.com (Hanni Ali) Date: Tue, 9 Dec 2008 14:07:34 +0000 Subject: [Numpy-discussion] Importance of order when summing values in an array Message-ID: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> Hi All, I have encountered a puzzling issue and I am not certain if this is a mistake of my own doing or not. Would someone kindly just look over this issue to make sure I'm not doing something very silly. So, why would the sum of an array have a different value depending on the order I select the indices of the array? >>> vector[[39, 46, 49, 50, 6, 9, 12, 14, 15, 17, 21]].sum() 8933281.8757099733 >>> vector[[6, 9, 12, 14, 15, 17, 21, 39, 46, 49, 50]].sum() 8933281.8757099714 >>> sum(vector[[39, 46, 49, 50, 6, 9, 12, 14, 15, 17, 21]]) 8933281.8757099733 >>> sum(vector[[6, 9, 12, 14, 15, 17, 21, 39, 46, 49, 50]]) 8933281.8757099714 Any thoughts? Cheers, Hanni -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Tue Dec 9 09:14:52 2008 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 9 Dec 2008 16:14:52 +0200 Subject: [Numpy-discussion] Importance of order when summing values in anarray References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> Message-ID: <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> The highest accuracy is obtained when you sum an acceding ordered series, and the lowest accuracy with descending ordered. In between you might get a variety of rounding errors. Nadav. -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? Hanni Ali ????: ? 09-?????-08 16:07 ??: Discussion of Numerical Python ????: [Numpy-discussion] Importance of order when summing values in anarray Hi All, I have encountered a puzzling issue and I am not certain if this is a mistake of my own doing or not. Would someone kindly just look over this issue to make sure I'm not doing something very silly. So, why would the sum of an array have a different value depending on the order I select the indices of the array? >>> vector[[39, 46, 49, 50, 6, 9, 12, 14, 15, 17, 21]].sum() 8933281.8757099733 >>> vector[[6, 9, 12, 14, 15, 17, 21, 39, 46, 49, 50]].sum() 8933281.8757099714 >>> sum(vector[[39, 46, 49, 50, 6, 9, 12, 14, 15, 17, 21]]) 8933281.8757099733 >>> sum(vector[[6, 9, 12, 14, 15, 17, 21, 39, 46, 49, 50]]) 8933281.8757099714 Any thoughts? Cheers, Hanni -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3346 bytes Desc: not available URL: From aisaac at american.edu Tue Dec 9 09:30:05 2008 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 09 Dec 2008 09:30:05 -0500 Subject: [Numpy-discussion] Line of best fit! In-Reply-To: <493D8470.8000303@fmnmedia.co.uk> References: <493D0685.5050903@fmnmedia.co.uk> <493D8470.8000303@fmnmedia.co.uk> Message-ID: <493E80ED.7070605@american.edu> On 12/8/2008 3:32 PM James apparently wrote: > I have a very simple plot, and the lines join point to point, however i > would like to add a line of best fit now onto the chart, i am really new > to python etc, and didnt really understand those links! See the `slope_intercept` method of the OLS class at http://code.google.com/p/econpy/source/browse/trunk/pytrix/ls.py Cheers, Alan Isaac From hanni.ali at gmail.com Tue Dec 9 09:34:25 2008 From: hanni.ali at gmail.com (Hanni Ali) Date: Tue, 9 Dec 2008 14:34:25 +0000 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> Message-ID: <789d27b10812090634h3f22c4d2rf2a801b75b29d06a@mail.gmail.com> Thank you Nadav. 2008/12/9 Nadav Horesh > The highest accuracy is obtained when you sum an acceding ordered series, > and the lowest accuracy with descending ordered. In between you might get a > variety of rounding errors. > > Nadav. > > -----????? ??????----- > ???: numpy-discussion-bounces at scipy.org ??? Hanni Ali > ????: ? 09-?????-08 16:07 > ??: Discussion of Numerical Python > ????: [Numpy-discussion] Importance of order when summing values in anarray > > Hi All, > > I have encountered a puzzling issue and I am not certain if this is a > mistake of my own doing or not. Would someone kindly just look over this > issue to make sure I'm not doing something very silly. > > So, why would the sum of an array have a different value depending on the > order I select the indices of the array? > > >>> vector[[39, 46, 49, 50, 6, 9, 12, 14, 15, 17, 21]].sum() > 8933281.8757099733 > >>> vector[[6, 9, 12, 14, 15, 17, 21, 39, 46, 49, 50]].sum() > 8933281.8757099714 > >>> sum(vector[[39, 46, 49, 50, 6, 9, 12, 14, 15, 17, 21]]) > 8933281.8757099733 > >>> sum(vector[[6, 9, 12, 14, 15, 17, 21, 39, 46, 49, 50]]) > 8933281.8757099714 > > Any thoughts? > > Cheers, > > Hanni > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Dec 9 09:51:43 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 09 Dec 2008 08:51:43 -0600 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> Message-ID: <493E85FF.2060106@gmail.com> Nadav Horesh wrote: > The highest accuracy is obtained when you sum an acceding ordered series, and the lowest accuracy with descending ordered. In between you might get a variety of rounding errors. > > Nadav. > > -----????? ??????----- > ???: numpy-discussion-bounces at scipy.org ??? Hanni Ali > ????: ? 09-?????-08 16:07 > ??: Discussion of Numerical Python > ????: [Numpy-discussion] Importance of order when summing values in anarray > > Hi All, > > I have encountered a puzzling issue and I am not certain if this is a > mistake of my own doing or not. Would someone kindly just look over this > issue to make sure I'm not doing something very silly. > > So, why would the sum of an array have a different value depending on the > order I select the indices of the array? > > >>>> vector[[39, 46, 49, 50, 6, 9, 12, 14, 15, 17, 21]].sum() >>>> > 8933281.8757099733 > >>>> vector[[6, 9, 12, 14, 15, 17, 21, 39, 46, 49, 50]].sum() >>>> > 8933281.8757099714 > >>>> sum(vector[[39, 46, 49, 50, 6, 9, 12, 14, 15, 17, 21]]) >>>> > 8933281.8757099733 > >>>> sum(vector[[6, 9, 12, 14, 15, 17, 21, 39, 46, 49, 50]]) >>>> > 8933281.8757099714 > > Any thoughts? > > Cheers, > > Hanni > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > Also, increase the numerical precision as that may depend on your platform especially given the input values above are ints. Numpy has float128 and int64 that will minimize rounding error. Bruce From hanni.ali at gmail.com Tue Dec 9 10:00:07 2008 From: hanni.ali at gmail.com (Hanni Ali) Date: Tue, 9 Dec 2008 15:00:07 +0000 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <493E85FF.2060106@gmail.com> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> <493E85FF.2060106@gmail.com> Message-ID: <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> Hi Bruce, Ahh, but I would have thought the precision for the array operation would be the same no matter which values I wish to sum? The array is in float64 in all cases. I would not have thought altering the type of the integer values would make any difference as these indices are all below 5 milllion. Perhaps I have misunderstood your suggestion could you expand. Cheers, Hanni Also, increase the numerical precision as that may depend on your > platform especially given the input values above are ints. Numpy has > float128 and int64 that will minimize rounding error. > > Bruce > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Dec 9 10:46:03 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 09 Dec 2008 09:46:03 -0600 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> <493E85FF.2060106@gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> Message-ID: <493E92BB.9060403@gmail.com> Hanni Ali wrote: > Hi Bruce, > > Ahh, but I would have thought the precision for the array operation > would be the same no matter which values I wish to sum? The array is > in float64 in all cases. > > I would not have thought altering the type of the integer values would > make any difference as these indices are all below 5 milllion. > > Perhaps I have misunderstood your suggestion could you expand. > > Cheers, > > Hanni > > > Also, increase the numerical precision as that may depend on your > platform especially given the input values above are ints. Numpy has > float128 and int64 that will minimize rounding error. > > Bruce > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > Hi, The main issue is the number of significant digits that you have which is not the number of decimals in your case. So while the numerical difference in the results is in the order about 1.86e-09, the actual difference starts at the 15th significant place. This is expected due to the number of significant digits of a 64-bit number (15-16). With higher precision like float128 you should get about 34 significant digits depending accuracy in all steps (i.e., the numbers must be stored as float128 and the summations done in float128 precision). Note there is a secondary issue of converting numbers between different types as well as the binary representation of decimal numbers. Also, rather than just simple summing, there are alternative algorithms like Kahan summation algorithm that can minimize errors. Bruce From nadavh at visionsense.com Tue Dec 9 10:51:49 2008 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 9 Dec 2008 17:51:49 +0200 Subject: [Numpy-discussion] Importance of order when summing values in anarray References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> <493E85FF.2060106@gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> Message-ID: <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> As much as I know float128 are in fact 80 bits (64 mantissa + 16 exponent) so the precision is 18-19 digits (not 34) Nadav. -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? Bruce Southey ????: ? 09-?????-08 17:46 ??: Discussion of Numerical Python ????: Re: [Numpy-discussion] Importance of order when summing values in anarray Hanni Ali wrote: > Hi Bruce, > > Ahh, but I would have thought the precision for the array operation > would be the same no matter which values I wish to sum? The array is > in float64 in all cases. > > I would not have thought altering the type of the integer values would > make any difference as these indices are all below 5 milllion. > > Perhaps I have misunderstood your suggestion could you expand. > > Cheers, > > Hanni > > > Also, increase the numerical precision as that may depend on your > platform especially given the input values above are ints. Numpy has > float128 and int64 that will minimize rounding error. > > Bruce > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > Hi, The main issue is the number of significant digits that you have which is not the number of decimals in your case. So while the numerical difference in the results is in the order about 1.86e-09, the actual difference starts at the 15th significant place. This is expected due to the number of significant digits of a 64-bit number (15-16). With higher precision like float128 you should get about 34 significant digits depending accuracy in all steps (i.e., the numbers must be stored as float128 and the summations done in float128 precision). Note there is a secondary issue of converting numbers between different types as well as the binary representation of decimal numbers. Also, rather than just simple summing, there are alternative algorithms like Kahan summation algorithm that can minimize errors. Bruce _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4202 bytes Desc: not available URL: From Shawn.Gong at drdc-rddc.gc.ca Tue Dec 9 11:00:30 2008 From: Shawn.Gong at drdc-rddc.gc.ca (Gong, Shawn (Contractor)) Date: Tue, 9 Dec 2008 11:00:30 -0500 Subject: [Numpy-discussion] numpy build error on Solaris, No module named _md5 Message-ID: hi list, I tried to build numpy 1.2.1 on Solaris 9 with gcc 3.4.6 when I typed "python setup.py build", I got error from hashlib.py File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 133, in md5 = __get_builtin_constructor('md5') File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 60, in __get_builtin_constructor import _md5 ImportError: No module named _md5 I then tried python 2.6.1 instead of 2.5.2, but got the same error. I did not get the error while building on Linux. But I performed steps on Linux: 1) copy *.a Atlas libraries to my local_install/atlas/ 2) ranlib *.a 3) created a site.cfg Do I need to do the same on Solaris? Any help is appreciated. thanks, Shawn -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Tue Dec 9 11:44:53 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 9 Dec 2008 17:44:53 +0100 Subject: [Numpy-discussion] numpy build error on Solaris, No module named _md5 In-Reply-To: References: Message-ID: Hi, Does: >>> import md5 work? If it doesn't, it's a packaging problem. md5 must be available. Matthieu 2008/12/9 Gong, Shawn (Contractor) : > hi list, > > I tried to build numpy 1.2.1 on Solaris 9 with gcc 3.4.6 > > when I typed "python setup.py build", I got error from hashlib.py > > File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 133, in > > > md5 = __get_builtin_constructor('md5') > > File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 60, in > __get_builtin_constructor > > import _md5 > > ImportError: No module named _md5 > > I then tried python 2.6.1 instead of 2.5.2, but got the same error. > > I did not get the error while building on Linux. But I performed steps on > Linux: > > 1) copy *.a Atlas libraries to my local_install/atlas/ > > 2) ranlib *.a > > 3) created a site.cfg > > Do I need to do the same on Solaris? > > Any help is appreciated. > > thanks, > > Shawn > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From cournape at gmail.com Tue Dec 9 11:49:38 2008 From: cournape at gmail.com (David Cournapeau) Date: Wed, 10 Dec 2008 01:49:38 +0900 Subject: [Numpy-discussion] numpy build error on Solaris, No module named _md5 In-Reply-To: References: Message-ID: <5b8d13220812090849k79ef657fla73e2d0b1f1a603d@mail.gmail.com> On Wed, Dec 10, 2008 at 1:00 AM, Gong, Shawn (Contractor) wrote: > hi list, > > Do I need to do the same on Solaris? This has nothing to do with ATLAS. You did not build correctly python, or the python you are using is not built correctly. _md5 is a module from python, not from numpy. cheers, David From lou_boog2000 at yahoo.com Tue Dec 9 11:50:14 2008 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Tue, 9 Dec 2008 08:50:14 -0800 (PST) Subject: [Numpy-discussion] One Solution to: What to use to read and write numpy arrays to a file? In-Reply-To: <493E92BB.9060403@gmail.com> Message-ID: <264175.50384.qm@web34405.mail.mud.yahoo.com> I found one solution that's pretty simple for easy read and write to/from a file of a numpy array (see my original message below). Just use the method tolist(). e.g. a complex 2 x 2 array arr=array([[1.0,3.0-7j],[55.2+4.0j,-95.34]]) ls=arr.tolist() Then use the repr - eval pairings to write and later read the list from the file and then convert the list that is read in back to an array: [ls_str]=fp.readline() ls_in= eval(ls_str) arr_in=array(ls_in) # arr_in is same as arr Seems to work well. Any comments? -- Lou Pecora, my views are my own. --- On Tue, 12/9/08, Lou Pecora wrote: In looking for simple ways to read and write data (in a text readable format) to and from a file and later restoring the actual data when reading back in, I've found that numpy arrays don't seem to play well with repr and eval. E.g. to write some data (mixed types) to a file I can do this (fp is an open file), thedata=[3.0,-4.9+2.0j,'another string'] repvars= repr(thedata)+"\n" fp.write(repvars) Then to read it back and restore the data each to its original type, strvars= fp.readline() sonofdata= eval(strvars) which gives back the original data list. BUT when I try this with numpy arrays in the data list I find that repr of an array adds extra end-of-lines and that messes up the simple restoration of the data using eval. Am I missing something simple? I know I've seen people recommend ways to save arrays to files, but I'm wondering what is the most straight-forward? I really like the simple, pythonic approach of the repr - eval pairing. From Shawn.Gong at drdc-rddc.gc.ca Tue Dec 9 11:55:21 2008 From: Shawn.Gong at drdc-rddc.gc.ca (Gong, Shawn (Contractor)) Date: Tue, 9 Dec 2008 11:55:21 -0500 Subject: [Numpy-discussion] numpy build error on Solaris, No module named _md5 In-Reply-To: References: Message-ID: hi Matthieu, import md5 doesn't work. I got: >>> import md5 Traceback (most recent call last): File "", line 1, in File "/home/sgong/dev181/dist.org/lib/python2.5/md5.py", line 6, in from hashlib import md5 File "/home/sgong/dev181/dist.org/lib/python2.5/hashlib.py", line 133, in md5 = __get_builtin_constructor('md5') File "/home/sgong/dev181/dist.org/lib/python2.5/hashlib.py", line 60, in __get_builtin_constructor import _md5 ImportError: No module named _md5 But I followed the same steps to build python 2.5.2 as on Linux: config make clean make make -i install (because there is an older python 2.5.1 on my /usr/local/bin/) thanks, Shawn -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Matthieu Brucher Sent: Tuesday, December 09, 2008 11:45 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] numpy build error on Solaris,No module named _md5 Hi, Does: >>> import md5 work? If it doesn't, it's a packaging problem. md5 must be available. Matthieu 2008/12/9 Gong, Shawn (Contractor) : > hi list, > > I tried to build numpy 1.2.1 on Solaris 9 with gcc 3.4.6 > > when I typed "python setup.py build", I got error from hashlib.py > > File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 133, in > > > md5 = __get_builtin_constructor('md5') > > File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 60, in > __get_builtin_constructor > > import _md5 > > ImportError: No module named _md5 > > I then tried python 2.6.1 instead of 2.5.2, but got the same error. > > I did not get the error while building on Linux. But I performed steps on > Linux: > > 1) copy *.a Atlas libraries to my local_install/atlas/ > > 2) ranlib *.a > > 3) created a site.cfg > > Do I need to do the same on Solaris? > > Any help is appreciated. > > thanks, > > Shawn > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion From matthieu.brucher at gmail.com Tue Dec 9 11:56:42 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 9 Dec 2008 17:56:42 +0100 Subject: [Numpy-discussion] numpy build error on Solaris, No module named _md5 In-Reply-To: References: Message-ID: You should ask on a general Python list, as it's a Python problem, not a numpy one ;) Matthieu PS: look at the log when you built Python, there must be a mention of the not building of the md5 module. 2008/12/9 Gong, Shawn (Contractor) : > hi Matthieu, > > import md5 doesn't work. I got: > >>>> import md5 > Traceback (most recent call last): > File "", line 1, in > File "/home/sgong/dev181/dist.org/lib/python2.5/md5.py", line 6, in > > from hashlib import md5 > File "/home/sgong/dev181/dist.org/lib/python2.5/hashlib.py", line 133, > in > md5 = __get_builtin_constructor('md5') > File "/home/sgong/dev181/dist.org/lib/python2.5/hashlib.py", line 60, > in __get_builtin_constructor > import _md5 > ImportError: No module named _md5 > > > But I followed the same steps to build python 2.5.2 as on Linux: > config > make clean > make > make -i install (because there is an older python 2.5.1 on my > /usr/local/bin/) > > > thanks, > Shawn > > > -----Original Message----- > From: numpy-discussion-bounces at scipy.org > [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Matthieu > Brucher > Sent: Tuesday, December 09, 2008 11:45 AM > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] numpy build error on Solaris,No module > named _md5 > > Hi, > > Does: > >>>> import md5 > > work? If it doesn't, it's a packaging problem. md5 must be available. > > Matthieu > > 2008/12/9 Gong, Shawn (Contractor) : >> hi list, >> >> I tried to build numpy 1.2.1 on Solaris 9 with gcc 3.4.6 >> >> when I typed "python setup.py build", I got error from hashlib.py >> >> File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 133, > in >> >> >> md5 = __get_builtin_constructor('md5') >> >> File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 60, in >> __get_builtin_constructor >> >> import _md5 >> >> ImportError: No module named _md5 >> >> I then tried python 2.6.1 instead of 2.5.2, but got the same error. >> >> I did not get the error while building on Linux. But I performed steps > on >> Linux: >> >> 1) copy *.a Atlas libraries to my local_install/atlas/ >> >> 2) ranlib *.a >> >> 3) created a site.cfg >> >> Do I need to do the same on Solaris? >> >> Any help is appreciated. >> >> thanks, >> >> Shawn >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > > -- > Information System Engineer, Ph.D. > Website: http://matthieu-brucher.developpez.com/ > Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > LinkedIn: http://www.linkedin.com/in/matthieubrucher > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From michael.abshoff at googlemail.com Tue Dec 9 11:54:17 2008 From: michael.abshoff at googlemail.com (Michael Abshoff) Date: Tue, 09 Dec 2008 08:54:17 -0800 Subject: [Numpy-discussion] numpy build error on Solaris, No module named _md5 In-Reply-To: References: Message-ID: <493EA2B9.702@gmail.com> Gong, Shawn (Contractor) wrote: > hi list, Hi Shawn, > I tried to build numpy 1.2.1 on Solaris 9 with gcc 3.4.6 > > when I typed ?python setup.py build?, I got error from hashlib.py > > File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 133, in > > > md5 = __get_builtin_constructor('md5') > > File "/home/sgong/dev181/dist/lib/python2.5/hashlib.py", line 60, in > __get_builtin_constructor > > import _md5 > > ImportError: No module named _md5 > > I then tried python 2.6.1 instead of 2.5.2, but got the same error. > > I did not get the error while building on Linux. But I performed steps > on Linux: > > 1) copy *.a Atlas libraries to my local_install/atlas/ > > 2) ranlib *.a > > 3) created a site.cfg > > Do I need to do the same on Solaris? > > Any help is appreciated. This is a pure Python issue and has nothing to do with numpy. When Python was build for that install it did either not have access to OpenSSL or the Sun crypto libs or you are missing some bits that need to be installed on Solaris. Did you build that Python on your own or where did it come from? > thanks, > > Shawn > Cheers, Michael > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From Chris.Barker at noaa.gov Tue Dec 9 12:59:30 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 09 Dec 2008 09:59:30 -0800 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> Message-ID: <493EB202.4030308@noaa.gov> Jarrod Millman wrote: >>From the user's perspective, I would like all the NumPy IO code to be > in the same place in NumPy; and all the SciPy IO code to be in the > same place in SciPy. +1 > So I > wonder if it would make sense to incorporate AstroAsciiData? Doesn't it overlap a lot with genloadtxt? If so, that's a bit confusing to new users. > 3. What about data source? > Should we remove datasource? Start using it more? start using it more -- it sounds very handy. > Does it need to be > slightly or dramatically improved/overhauled? no comment here - I have no idea. > Documentation > --------------------- > Let me try NumPy; this seems > pretty good. Now let's see how to load in some of my data....") totally key -- I have a colleague that has used Matlab a fair bi tin past that is starting a new project -- he asked me what to use. I, of course, suggested python+numpy+scipy. His first question was -- can I load data in from excel? One more comment -- for fast reading of lots of ascii data, fromfile() needs some help -- I wish I had more time for it -- maybe some day. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Tue Dec 9 15:13:17 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 9 Dec 2008 15:13:17 -0500 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: <493EB202.4030308@noaa.gov> References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> <493EB202.4030308@noaa.gov> Message-ID: <60C1C121-2B2E-4264-8BB8-D65EBA91120B@gmail.com> On Dec 9, 2008, at 12:59 PM, Christopher Barker wrote: > Jarrod Millman wrote: > >>> From the user's perspective, I would like all the NumPy IO code to >>> be >> in the same place in NumPy; and all the SciPy IO code to be in the >> same place in SciPy. > > +1 So, no problem w/ importing numpy.ma and numpy.records in numpy.lib.io ? > > >> So I >> wonder if it would make sense to incorporate AstroAsciiData? > > Doesn't it overlap a lot with genloadtxt? If so, that's a bit > confusing > to new users. For the little I browsed, do we need it ? We could get the same thing with record arrays... >> 3. What about data source? > >> Should we remove datasource? Start using it more? > > start using it more -- it sounds very handy. Didn't know it was around. I'll adapt genloadtxt to use it. >> Documentation >> --------------------- >> Let me try NumPy; this seems >> pretty good. Now let's see how to load in some of my data....") > > totally key -- I have a colleague that has used Matlab a fair bi tin > past that is starting a new project -- he asked me what to use. I, of > course, suggested python+numpy+scipy. His first question was -- can I > load data in from excel? So that would go in scipy.io ? > > One more comment -- for fast reading of lots of ascii data, fromfile() > needs some help -- I wish I had more time for it -- maybe some day. I'm afraid you'd have to count me out on this one: I don't speak C (yet), and don't foresee learning it soon enough to be of any help... From babaktei at yahoo.com Tue Dec 9 15:25:27 2008 From: babaktei at yahoo.com (Bab Tei) Date: Tue, 9 Dec 2008 12:25:27 -0800 (PST) Subject: [Numpy-discussion] Excluding index in numpy like negative index in R? Message-ID: <968008.61869.qm@web50411.mail.re2.yahoo.com> Hi I can exclude a list of items by using negative index in R (R-project) ie myarray[-excludeindex]. As negative indexing in numpy (And python) behave differently ,how can I exclude a list of item in numpy? Regards, Teimourpour From babaktei at yahoo.com Tue Dec 9 15:28:15 2008 From: babaktei at yahoo.com (Bab Tei) Date: Tue, 9 Dec 2008 12:28:15 -0800 (PST) Subject: [Numpy-discussion] Support for sparse matrix in Distance function (and clustering)? Message-ID: <993136.52361.qm@web50410.mail.re2.yahoo.com> Hi Does the distance function in spatial package support sparse matrix? regards From robert.kern at gmail.com Tue Dec 9 15:40:18 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 9 Dec 2008 14:40:18 -0600 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> <493E85FF.2060106@gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> Message-ID: <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> On Tue, Dec 9, 2008 at 09:51, Nadav Horesh wrote: > As much as I know float128 are in fact 80 bits (64 mantissa + 16 exponent) so the precision is 18-19 digits (not 34) float128 should be 128 bits wide. If it's not on your platform, please let us know as that is a bug in your build. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From discerptor at gmail.com Tue Dec 9 15:46:51 2008 From: discerptor at gmail.com (Joshua Lippai) Date: Tue, 9 Dec 2008 12:46:51 -0800 Subject: [Numpy-discussion] Excluding index in numpy like negative index in R? In-Reply-To: <968008.61869.qm@web50411.mail.re2.yahoo.com> References: <968008.61869.qm@web50411.mail.re2.yahoo.com> Message-ID: <9911419a0812091246m6ecbc112i793475a03cc613a5@mail.gmail.com> You can make a mask array in numpy to prune out items from an array that you don't want, denoting indices you want to keep with 1's and those you don't want to keep with 0's. For instance, a = np.array([1,3,45,67,123]) mask = np.array([0,1,1,0,1],dtype=np.bool) anew = a[mask] will set anew equal to array([3, 45, 123]) Josh On Tue, Dec 9, 2008 at 12:25 PM, Bab Tei wrote: > Hi > I can exclude a list of items by using negative index in R (R-project) ie myarray[-excludeindex]. As negative indexing in numpy (And python) behave differently ,how can I exclude a list of item in numpy? > Regards, Teimourpour > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From kwgoodman at gmail.com Tue Dec 9 16:07:08 2008 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 9 Dec 2008 13:07:08 -0800 Subject: [Numpy-discussion] Excluding index in numpy like negative index in R? In-Reply-To: <968008.61869.qm@web50411.mail.re2.yahoo.com> References: <968008.61869.qm@web50411.mail.re2.yahoo.com> Message-ID: On Tue, Dec 9, 2008 at 12:25 PM, Bab Tei wrote: > I can exclude a list of items by using negative index in R (R-project) ie myarray[-excludeindex]. As negative indexing in numpy (And python) behave differently ,how can I exclude a list of item in numpy? Here's a painful way to do it: >> x = np.array([0,1,2,3,4]) >> excludeindex = [1,3] >> idx = list(set(range(4)) - set(excludeindex)) >> x[idx] array([0, 2]) To make it more painful, you might want to sort idx. But if excludeindex is True/False, then just use ~excludeindex. From eads at soe.ucsc.edu Tue Dec 9 17:32:53 2008 From: eads at soe.ucsc.edu (Damian Eads) Date: Tue, 9 Dec 2008 15:32:53 -0700 Subject: [Numpy-discussion] Support for sparse matrix in Distance function (and clustering)? In-Reply-To: <993136.52361.qm@web50410.mail.re2.yahoo.com> References: <993136.52361.qm@web50410.mail.re2.yahoo.com> Message-ID: <91b4b1ab0812091432l4306c1bep6a20370e1e3615f6@mail.gmail.com> Hi, Can you be more specific? Do you need sparse matrices to represent observation vectors because they are sparse? Or do you need sparse matrices to represent distance matrices because most vectors you are clustering are similar while a few are dissimilar? The clustering code is written mostly in C and does not support sparse matrices. However, this should not matter because most of the clustering code does not look at the raw observation vectors themselves, just the distances passed as a distance matrix. Damian On Tue, Dec 9, 2008 at 1:28 PM, Bab Tei wrote: > Hi > Does the distance function in spatial package support sparse matrix? > regards > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- ----------------------------------------------------- Damian Eads Ph.D. Student Jack Baskin School of Engineering, UCSC E2-489 1156 High Street Machine Learning Lab Santa Cruz, CA 95064 http://www.soe.ucsc.edu/~eads From spacey-numpy-discussion at lenin.net Tue Dec 9 18:50:04 2008 From: spacey-numpy-discussion at lenin.net (Peter Norton) Date: Tue, 9 Dec 2008 18:50:04 -0500 Subject: [Numpy-discussion] Numscons issues: numpy.core.umath_tests not built, built-in ld detection, MAIN__ not being set-up Message-ID: I've got a few issues that I hope won't be overwhelming on one message: (1) Because of some issues in the past in building numpy with numscons, the numpy.core.umath_tests don't get built with numpy+numscons (at least not as of svn version 6128). $ python -c 'import numpy; print numpy.__version__; import numpy.core.umath_tests' 1.3.0.dev6139 Traceback (most recent call last): File "", line 1, in ImportError: No module named umath_tests What needs to be done to get this module incorporated into the numscons build? (2) I've found that in numscons-0.9.4, the detection of the correct linker assumes that if gcc is in use, the linker is gnu ld. However, on solaris this isn't the recommended toolchain, so it's typical to build gcc with gnu as and the solaris /usr/ccs/bin/ld under the hood. What this means is that when setting a run_path in the binary (which we need to do) the linker flags are set to "-Wl,-rpath=". However, this isn't valid for the solaris ld. It needs -R, or -Wl,-R. I'm pretty sure that on Solaris trying to link a library with -Wl,-rpath= and looking for an error should be enough to determine the correct format for the linker. (3) Numscons tries to check for the need for a MAIN__ function when linking with gfortran. However, any libraries built with numscons come out with an unsatisfied dependency on MAIN__. The log looks like this in build/scons/numpy/linalg/config.log looks like this: scons: Configure: Checking if gfortran needs dummy main - scons: Configure: "build/scons/numpy/linalg/sconf/conftest_0.c" is up to date. scons: Configure: The original builder output was: |build/scons/numpy/linalg/sconf/conftest_0.c <- | | | |int dummy() { return 0; } | | | scons: Configure: "build/scons/numpy/linalg/sconf/conftest_0.o" is up to date. scons: Configure: The original builder output was: |gcc -o build/scons/numpy/linalg/sconf/conftest_0.o -c -O3 -m64 -g -fPIC -DPIC build/scons/numpy/linalg/sconf/conftest_0.c | scons: Configure: Building "build/scons/numpy/linalg/sconf/conftest_0" failed in a previous run and all its sources are up to date. scons: Configure: The original builder output was: |gfortran -o build/scons/numpy/linalg/sconf/conftest_0 -O3 -g -L/usr/local/lib/gcc-4.3.1/amd64 -Wl,-R/usr/local/lib/gcc-4.3.1/amd64 -L/usr/local/amd64/python/lib -Wl,-R/usr/local/amd64/python/lib -L. -lgcc_s build/scons/numpy/linalg/sconf/conftest_0.o | It then goes on to discover that it needs main: scons: Configure: "build/scons/numpy/linalg/sconf/conftest_1" is up to date. scons: Configure: The original builder output was: |gfortran -o build/scons/numpy/linalg/sconf/conftest_1 -O3 -g -L/usr/local/lib/gcc-4.3.1/amd64 -Wl,-R/usr/local/lib/gcc-4.3.1/amd64 -L/usr/local/amd64/python/lib -Wl,-R/usr/local/amd64/python/lib -L. -lgcc_s build/scons/numpy/linalg/sconf/conftest_1.o | scons: Configure: (cached) MAIN__. Doesn't this clearly indicate that a dummy main is needed? I'm working around this with a silly library that just has the MAIN__ symbol in it, but I'd love to do without that. Thanks, Peter From chaos.proton at gmail.com Tue Dec 9 21:24:23 2008 From: chaos.proton at gmail.com (Grissiom) Date: Wed, 10 Dec 2008 10:24:23 +0800 Subject: [Numpy-discussion] How to unitize a array in numpy? Message-ID: Hi all, Nice to neet you all. I am a newbie in numpy. Is there any function that could unitize a array? Thanks in advance. -- Cheers, Grissiom -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Dec 9 21:35:21 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 9 Dec 2008 20:35:21 -0600 Subject: [Numpy-discussion] How to unitize a array in numpy? In-Reply-To: References: Message-ID: <3d375d730812091835v380c36bdvf64ad344de44326a@mail.gmail.com> On Tue, Dec 9, 2008 at 20:24, Grissiom wrote: > Hi all, > > Nice to neet you all. I am a newbie in numpy. Is there any function that > could unitize a array? What do you mean by "unitize"? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Tue Dec 9 21:36:53 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 9 Dec 2008 20:36:53 -0600 Subject: [Numpy-discussion] How to unitize a array in numpy? In-Reply-To: References: Message-ID: <3d375d730812091836j77d2224h7ad7cf5fe80a8ca2@mail.gmail.com> On Tue, Dec 9, 2008 at 20:24, Grissiom wrote: > Hi all, > > Nice to neet you all. I am a newbie in numpy. Is there any function that > could unitize a array? If you mean like the Mathematica function Unitize[] defined here: http://reference.wolfram.com/mathematica/ref/Unitize.html Then .astype(bool) is probably sufficient. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From vagabondaero at gmail.com Tue Dec 9 21:40:15 2008 From: vagabondaero at gmail.com (Vagabond_Aero) Date: Tue, 9 Dec 2008 21:40:15 -0500 Subject: [Numpy-discussion] how do I delete unused matrix to save the memory? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> Message-ID: <8b5ec91a0812091840v7054c063pe2b3922300d2fb9a@mail.gmail.com> I have the same problem. I tried the del command below, but foundon that it removes the names of the ndarrays from memory, but does not free up the memory on my XP system (python 2.5.2, numpy 1.2.1). Regular python objects release their memory when I use the del command, but it looks like the ndarray objects do not. On Mon, Dec 8, 2008 at 22:00, Travis Vaught wrote: > Try: > > del(myvariable) > > Travis > > On Dec 8, 2008, at 7:15 PM, frank wang wrote: > > Hi, > > I have a program with some variables consume a lot of memory. The first > time I run it, it is fine. The second time I run it, I will get MemoryError. > If I close the ipython and reopen it again, then I can run the program once. > I am looking for a command to delete the intermediate variable once it is > not used to save memory like in matlab clear command. > > Thanks > > Frank > > ------------------------------ > Send e-mail faster without improving your typing skills. Get your Hotmail(R) > account. > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Dec 9 21:45:00 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 9 Dec 2008 20:45:00 -0600 Subject: [Numpy-discussion] how do I delete unused matrix to save the memory? In-Reply-To: <8b5ec91a0812091840v7054c063pe2b3922300d2fb9a@mail.gmail.com> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> <8b5ec91a0812091840v7054c063pe2b3922300d2fb9a@mail.gmail.com> Message-ID: <3d375d730812091845y2efaeeb6y8e5713d0dbecf117@mail.gmail.com> On Tue, Dec 9, 2008 at 20:40, Vagabond_Aero wrote: > I have the same problem. I tried the del command below, but foundon that it > removes the names of the ndarrays from memory, but does not free up the > memory on my XP system (python 2.5.2, numpy 1.2.1). Regular python objects > release their memory when I use the del command, but it looks like the > ndarray objects do not. It's not guaranteed that the regular Python objects return memory to the OS, either. The memory should be reused when Python allocates new memory, though, so I suspect that this is not the problem that Frank is seeing. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Tue Dec 9 21:50:02 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 9 Dec 2008 19:50:02 -0700 Subject: [Numpy-discussion] Numscons issues: numpy.core.umath_tests not built, built-in ld detection, MAIN__ not being set-up In-Reply-To: References: Message-ID: On Tue, Dec 9, 2008 at 4:50 PM, Peter Norton < spacey-numpy-discussion at lenin.net> wrote: > I've got a few issues that I hope won't be overwhelming on one message: > > (1) Because of some issues in the past in building numpy with > numscons, the numpy.core.umath_tests don't get built with > numpy+numscons (at least not as of svn version 6128). > > $ python -c 'import numpy; print numpy.__version__; import > numpy.core.umath_tests' > 1.3.0.dev6139 > Traceback (most recent call last): > File "", line 1, in > ImportError: No module named umath_tests > > What needs to be done to get this module incorporated into the numscons > build? > It's also commented out of the usual setup.py file also because of blas/lapack linkage problems that need to be fixed; I was working on other things. It's probably time to fix it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaos.proton at gmail.com Tue Dec 9 21:56:01 2008 From: chaos.proton at gmail.com (Grissiom) Date: Wed, 10 Dec 2008 10:56:01 +0800 Subject: [Numpy-discussion] How to unitize a array in numpy? In-Reply-To: <3d375d730812091836j77d2224h7ad7cf5fe80a8ca2@mail.gmail.com> References: <3d375d730812091836j77d2224h7ad7cf5fe80a8ca2@mail.gmail.com> Message-ID: On Wed, Dec 10, 2008 at 10:36, Robert Kern wrote: > On Tue, Dec 9, 2008 at 20:24, Grissiom wrote: > > Hi all, > > > > Nice to neet you all. I am a newbie in numpy. Is there any function that > > could unitize a array? > > If you mean like the Mathematica function Unitize[] defined here: > > http://reference.wolfram.com/mathematica/ref/Unitize.html > > Then .astype(bool) is probably sufficient. > > -- > Robert Kern > I'm sorry for my poor English. I mean a function that could return a unit vector which have the same direction with the original one. Thanks. -- Cheers, Grissiom -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Dec 9 22:01:49 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 9 Dec 2008 20:01:49 -0700 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> <493E85FF.2060106@gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> Message-ID: On Tue, Dec 9, 2008 at 1:40 PM, Robert Kern wrote: > On Tue, Dec 9, 2008 at 09:51, Nadav Horesh wrote: > > As much as I know float128 are in fact 80 bits (64 mantissa + 16 > exponent) so the precision is 18-19 digits (not 34) > > float128 should be 128 bits wide. If it's not on your platform, please > let us know as that is a bug in your build. > I think he means the actual precision is the ieee extended precision, the number just happens to be stored into larger chunks of memory for alignment purposes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Dec 9 22:03:00 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 9 Dec 2008 21:03:00 -0600 Subject: [Numpy-discussion] how do I delete unused matrix to save the memory? In-Reply-To: References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> Message-ID: <3d375d730812091903x499672c0k92c2db5134c79c10@mail.gmail.com> On Mon, Dec 8, 2008 at 19:15, frank wang wrote: > Hi, > > I have a program with some variables consume a lot of memory. The first time > I run it, it is fine. The second time I run it, I will get MemoryError. If I > close the ipython and reopen it again, then I can run the program once. I am > looking for a command to delete the intermediate variable once it is not > used to save memory like in matlab clear command. How are you running this program? Be aware that IPython may be holding on to objects and preventing them from being deallocated. For example: In [7]: !cat memtest.py class A(object): def __del__(self): print 'Deleting %r' % self a = A() In [8]: %run memtest.py In [9]: %run memtest.py In [10]: %run memtest.py In [11]: del a In [12]: Do you really want to exit ([y]/n)? $ python memtest.py Deleting <__main__.A object at 0x915ab0> You can remove some of these references with %reset and maybe a gc.collect() for good measure. In [1]: %run memtest In [2]: %run memtest In [3]: %run memtest In [4]: %reset Once deleted, variables cannot be recovered. Proceed (y/[n])? y Deleting <__main__.A object at 0xf3e950> Deleting <__main__.A object at 0xf3e6d0> Deleting <__main__.A object at 0xf3e930> -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Tue Dec 9 22:04:24 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 9 Dec 2008 21:04:24 -0600 Subject: [Numpy-discussion] How to unitize a array in numpy? In-Reply-To: References: <3d375d730812091836j77d2224h7ad7cf5fe80a8ca2@mail.gmail.com> Message-ID: <3d375d730812091904i5a79b310h9dd3f835b887a487@mail.gmail.com> On Tue, Dec 9, 2008 at 20:56, Grissiom wrote: > On Wed, Dec 10, 2008 at 10:36, Robert Kern wrote: >> >> On Tue, Dec 9, 2008 at 20:24, Grissiom wrote: >> > Hi all, >> > >> > Nice to neet you all. I am a newbie in numpy. Is there any function that >> > could unitize a array? >> >> If you mean like the Mathematica function Unitize[] defined here: >> >> http://reference.wolfram.com/mathematica/ref/Unitize.html >> >> Then .astype(bool) is probably sufficient. >> >> -- >> Robert Kern > > I'm sorry for my poor English. I mean a function that could return a unit > vector which have the same direction with the original one. Thanks. v / numpy.linalg.norm(v) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From chaos.proton at gmail.com Tue Dec 9 22:08:28 2008 From: chaos.proton at gmail.com (Grissiom) Date: Wed, 10 Dec 2008 11:08:28 +0800 Subject: [Numpy-discussion] How to unitize a array in numpy? In-Reply-To: <3d375d730812091904i5a79b310h9dd3f835b887a487@mail.gmail.com> References: <3d375d730812091836j77d2224h7ad7cf5fe80a8ca2@mail.gmail.com> <3d375d730812091904i5a79b310h9dd3f835b887a487@mail.gmail.com> Message-ID: On Wed, Dec 10, 2008 at 11:04, Robert Kern wrote: > v / numpy.linalg.norm(v) > Thanks a lot ~;) -- Cheers, Grissiom -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Dec 9 22:10:32 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 9 Dec 2008 21:10:32 -0600 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> <493E85FF.2060106@gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> Message-ID: <3d375d730812091910vfe2da77s6b8afd8ff693ee80@mail.gmail.com> On Tue, Dec 9, 2008 at 21:01, Charles R Harris wrote: > > > On Tue, Dec 9, 2008 at 1:40 PM, Robert Kern wrote: >> >> On Tue, Dec 9, 2008 at 09:51, Nadav Horesh wrote: >> > As much as I know float128 are in fact 80 bits (64 mantissa + 16 >> > exponent) so the precision is 18-19 digits (not 34) >> >> float128 should be 128 bits wide. If it's not on your platform, please >> let us know as that is a bug in your build. > > I think he means the actual precision is the ieee extended precision, the > number just happens to be stored into larger chunks of memory for alignment > purposes. Ah, that's good to know. Yes, float128 on my Intel Mac behaves this way. In [12]: f = finfo(float128) In [13]: f.nmant Out[13]: 63 In [14]: f.nexp Out[14]: 15 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Tue Dec 9 23:01:56 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 9 Dec 2008 21:01:56 -0700 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <3d375d730812091910vfe2da77s6b8afd8ff693ee80@mail.gmail.com> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> <493E85FF.2060106@gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> <3d375d730812091910vfe2da77s6b8afd8ff693ee80@mail.gmail.com> Message-ID: On Tue, Dec 9, 2008 at 8:10 PM, Robert Kern wrote: > On Tue, Dec 9, 2008 at 21:01, Charles R Harris > wrote: > > > > > > On Tue, Dec 9, 2008 at 1:40 PM, Robert Kern > wrote: > >> > >> On Tue, Dec 9, 2008 at 09:51, Nadav Horesh > wrote: > >> > As much as I know float128 are in fact 80 bits (64 mantissa + 16 > >> > exponent) so the precision is 18-19 digits (not 34) > >> > >> float128 should be 128 bits wide. If it's not on your platform, please > >> let us know as that is a bug in your build. > > > > I think he means the actual precision is the ieee extended precision, the > > number just happens to be stored into larger chunks of memory for > alignment > > purposes. > > Ah, that's good to know. Yes, float128 on my Intel Mac behaves this way. > > In [12]: f = finfo(float128) > > In [13]: f.nmant > Out[13]: 63 > > In [14]: f.nexp > Out[14]: 15 > Yep. That's the reason I worry a bit about what will happen when ieee quad precision comes out; it really is 128 bits wide and the normal identifiers won't account for the difference. I expect c will just call them long doubles and they will get the 'g' letter code just like extended precision does now. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Wed Dec 10 00:42:07 2008 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 10 Dec 2008 07:42:07 +0200 Subject: [Numpy-discussion] how do I delete unused matrix to save the memory? In-Reply-To: <3d375d730812091903x499672c0k92c2db5134c79c10@mail.gmail.com> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> <3d375d730812091903x499672c0k92c2db5134c79c10@mail.gmail.com> Message-ID: <6a17e9ee0812092142h70dac967hf1fdf53ab9c4cb13@mail.gmail.com> > 2008/12/10 Robert Kern : > On Mon, Dec 8, 2008 at 19:15, frank wang wrote: >> Hi, >> >> I have a program with some variables consume a lot of memory. The first time >> I run it, it is fine. The second time I run it, I will get MemoryError. If I >> close the ipython and reopen it again, then I can run the program once. I am >> looking for a command to delete the intermediate variable once it is not >> used to save memory like in matlab clear command. > > How are you running this program? Be aware that IPython may be holding > on to objects and preventing them from being deallocated. For example: > > In [7]: !cat memtest.py > class A(object): > def __del__(self): > print 'Deleting %r' % self > > > a = A() > > In [8]: %run memtest.py > > In [9]: %run memtest.py > > In [10]: %run memtest.py > > In [11]: del a > > In [12]: > Do you really want to exit ([y]/n)? > > $ python memtest.py > Deleting <__main__.A object at 0x915ab0> > > > You can remove some of these references with %reset and maybe a > gc.collect() for good measure. Of course, if you don't need to have access to the variables created in your program from the IPython session, you can run the program in a separate python process: In [1]: !python memtest.py Deleting <__main__.A object at 0xb7da5ccc> In [2]: !python memtest.py Deleting <__main__.A object at 0xb7e5fccc> Cheers, Scott From david at ar.media.kyoto-u.ac.jp Wed Dec 10 00:59:33 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 10 Dec 2008 14:59:33 +0900 Subject: [Numpy-discussion] Numscons issues: numpy.core.umath_tests not built, built-in ld detection, MAIN__ not being set-up In-Reply-To: References: Message-ID: <493F5AC5.5000902@ar.media.kyoto-u.ac.jp> Peter Norton wrote: > I've got a few issues that I hope won't be overwhelming on one message: > > (1) Because of some issues in the past in building numpy with > numscons, the numpy.core.umath_tests don't get built with > numpy+numscons (at least not as of svn version 6128). > > $ python -c 'import numpy; print numpy.__version__; import > numpy.core.umath_tests' > 1.3.0.dev6139 > Traceback (most recent call last): > File "", line 1, in > ImportError: No module named umath_tests > > What needs to be done to get this module incorporated into the numscons build? you should not need this module, it is not built using the normal build of numpy either. Did you do a clean build (rm -rf build and removing the install directory first) ? It was enabled before but is commented out ATM. > > (2) I've found that in numscons-0.9.4, the detection of the correct > linker assumes that if gcc is in use, the linker is gnu ld. However, > on solaris this isn't the recommended toolchain, so it's typical to > build gcc with gnu as and the solaris /usr/ccs/bin/ld under the hood. > What this means is that when setting a run_path in the binary (which > we need to do) the linker flags are set to "-Wl,-rpath=". > However, this isn't valid for the solaris ld. It needs -R, or > -Wl,-R. I'm pretty sure that on Solaris trying to link a > library with -Wl,-rpath= and looking for an error should be enough to > determine the correct format for the linker. Scons and hence numscons indeed assume that the linker is the same as the compiler by default. It would be possible to avoid this by detecting the linker at runtime, to bypass scons tools choice, like I do for C, C++ and Fortran compilers. The whole scons tools sub-system is unfortunately very limited ATM, so there is a lot of manual work to do (that's actually what most of the code in numscons/core is for). > > (3) Numscons tries to check for the need for a MAIN__ function when > linking with gfortran. However, any libraries built with numscons come > out with an unsatisfied dependency on MAIN__. The log looks like this > in build/scons/numpy/linalg/config.log looks like this: It may be linked to the sun linker problem above. Actually, the dummy main detection is not used at all for the building - it is necessary to detect name mangling used by the fortran compiler, but that's it. I assumed that a dummy main was never needed for shared libraries, but that assumption may well be ill founded. I never had problems related to this on open solaris, with both native and gcc toolchains, so I am willing to investiage first whether it is linked to the sun linker problem or not. Unfortunately, I won't have the time to work on this in the next few months because of my PhD thesis; the sun linker problem can be fixed by following a strategy similar to compilers, in numscons/core/initialization.py. You first need to add a detection scheme for the linker in compiler_detection.py. David From gael.varoquaux at normalesup.org Wed Dec 10 01:38:01 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 10 Dec 2008 07:38:01 +0100 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> Message-ID: <20081210063801.GB24936@phare.normalesup.org> On Tue, Dec 09, 2008 at 01:34:29AM -0800, Jarrod Millman wrote: > It was decided last year that numpy io should provide simple, generic, > core io functionality. While scipy io would provide more domain- or > application-specific io code (e.g., Matlab IO, WAV IO, etc.) My > vision for scipy io, which I know isn't shared, is to be more or less > aiming to be all inclusive (e.g., all image, sound, and data formats). > (That is a different discussion; just wanted it to be clear where I > stand.) Can we get Matthew Brett's nifti reader in there? Please! Pretty please. That way I can do neuroimaging without compiled code outside of a standard scientific Python instal. Ga?l From charlesr.harris at gmail.com Wed Dec 10 02:49:07 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 10 Dec 2008 00:49:07 -0700 Subject: [Numpy-discussion] Some numpy statistics Message-ID: Hi All, I bumped into this while searching for something else: http://www.ohloh.net/p/numpy/analyses/latest Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Dec 10 02:55:43 2008 From: robert.kern at gmail.com (robert.kern at gmail.com) Date: Wed, 10 Dec 2008 01:55:43 -0600 Subject: [Numpy-discussion] Some numpy statistics In-Reply-To: References: Message-ID: <3d375d730812092355v3c17dbe2i7123dfb1e1c39678@mail.gmail.com> On Wed, Dec 10, 2008 at 01:49, Charles R Harris wrote: > Hi All, > > I bumped into this while searching for something else: > http://www.ohloh.net/p/numpy/analyses/latest -14 lines of Javascript? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From nadavh at visionsense.com Wed Dec 10 02:55:49 2008 From: nadavh at visionsense.com (Nadav Horesh) Date: Wed, 10 Dec 2008 09:55:49 +0200 Subject: [Numpy-discussion] Importance of order when summing values inanarray References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com><710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il><493E85FF.2060106@gmail.com><789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com><493E92BB.9060403@gmail.com><710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> Message-ID: <710F2847B0018641891D9A216027636029C371@ex3.envision.co.il> float128 are 16 bytes wide but have the structure of x87 80-bits + extra 6 bytes for alignment: >From "http://lwn.net/2001/features/OLS/pdf/pdf/x86-64.pdf": "... The x87 stack with 80-bit precision is only used for long double." And: >>> e47 = float128(1e-47) >>> e30 = float128(1e-30) >>> e50 = float128(1e-50) >>> (e30-e50) == e30 True >>> (e30-e47) == e30 False >>> This shows that float128 has no more then 19 digits precision Nadav. -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? Robert Kern ????: ? 09-?????-08 22:40 ??: Discussion of Numerical Python ????: Re: [Numpy-discussion] Importance of order when summing values inanarray On Tue, Dec 9, 2008 at 09:51, Nadav Horesh wrote: > As much as I know float128 are in fact 80 bits (64 mantissa + 16 exponent) so the precision is 18-19 digits (not 34) float128 should be 128 bits wide. If it's not on your platform, please let us know as that is a bug in your build. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3933 bytes Desc: not available URL: From charlesr.harris at gmail.com Wed Dec 10 03:22:54 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 10 Dec 2008 01:22:54 -0700 Subject: [Numpy-discussion] Some numpy statistics In-Reply-To: <3d375d730812092355v3c17dbe2i7123dfb1e1c39678@mail.gmail.com> References: <3d375d730812092355v3c17dbe2i7123dfb1e1c39678@mail.gmail.com> Message-ID: On Wed, Dec 10, 2008 at 12:55 AM, wrote: > On Wed, Dec 10, 2008 at 01:49, Charles R Harris > wrote: > > Hi All, > > > > I bumped into this while searching for something else: > > http://www.ohloh.net/p/numpy/analyses/latest > > -14 lines of Javascript? > Well, they have scipy mostly written in C++ and davidc as a C developer with a 29000 line commit ;) The code analysis isn't quite perfect and I think there are some bugs in computing the statistics. But it's kind of interesting anyway. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Dec 10 03:32:31 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 10 Dec 2008 02:32:31 -0600 Subject: [Numpy-discussion] Some numpy statistics In-Reply-To: References: <3d375d730812092355v3c17dbe2i7123dfb1e1c39678@mail.gmail.com> Message-ID: <3d375d730812100032w7da20728u89552b71c9e71ea6@mail.gmail.com> On Wed, Dec 10, 2008 at 02:22, Charles R Harris wrote: > > > On Wed, Dec 10, 2008 at 12:55 AM, wrote: >> >> On Wed, Dec 10, 2008 at 01:49, Charles R Harris >> wrote: >> > Hi All, >> > >> > I bumped into this while searching for something else: >> > http://www.ohloh.net/p/numpy/analyses/latest >> >> -14 lines of Javascript? > > Well, they have scipy mostly written in C++ and davidc as a C developer with > a 29000 line commit ;) The code analysis isn't quite perfect and I think > there are some bugs in computing the statistics. But it's kind of > interesting anyway. There are bugs, and then there are bugs. It seems like an invariants "numlines >= 0" should pertain even with dodgy language identification. I simply don't know what operations they would do to get negative numbers. In any case, sloccount tells me that most of scipy *is* C++. The generated sparsetools sources are quite large in addition to all of the Blitz sources. SLOC Directory SLOC-by-Language (Sorted) 177304 sparse cpp=134410,ansic=22394,fortran=12780,python=7720 96740 weave cpp=82265,python=14244,ansic=231 39321 special fortran=19749,ansic=16888,python=2684 18074 integrate fortran=15871,python=1156,ansic=1047 14472 interpolate fortran=10564,python=2493,ansic=1210,cpp=205 12471 ndimage python=6242,ansic=6229 11431 optimize fortran=5931,python=2864,ansic=2636 11390 odr fortran=9380,ansic=1192,python=818 9951 stats python=8526,fortran=1425 6801 signal ansic=3934,python=2867 5878 fftpack fortran=3973,python=1462,ansic=443 5756 io python=4987,ansic=769 4672 spatial python=2731,ansic=1941 4608 cluster python=2659,ansic=1949 4227 linalg python=3605,fortran=604,ansic=18 1530 lib python=1182,fortran=324,ansic=24 1471 stsci ansic=976,python=495 1125 maxentropy python=1125 940 misc python=940 494 constants python=494 160 top_dir python=160 3 linsolve python=3 Totals grouped by language (dominant language first): cpp: 216880 (50.58%) fortran: 80601 (18.80%) python: 69457 (16.20%) ansic: 61881 (14.43%) Total Physical Source Lines of Code (SLOC) = 428,819 Development Effort Estimate, Person-Years (Person-Months) = 116.12 (1,393.47) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 3.26 (39.15) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 35.60 Total Estimated Cost to Develop = $ 15,686,619 (average salary = $56,286/year, overhead = 2.40). SLOCCount, Copyright (C) 2001-2004 David A. Wheeler SLOCCount is Open Source Software/Free Software, licensed under the GNU GPL. SLOCCount comes with ABSOLUTELY NO WARRANTY, and you are welcome to redistribute it under certain conditions as specified by the GNU GPL license; see the documentation for details. Please credit this data as "generated using David A. Wheeler's 'SLOCCount'." -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david at ar.media.kyoto-u.ac.jp Wed Dec 10 04:03:25 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 10 Dec 2008 18:03:25 +0900 Subject: [Numpy-discussion] Some numpy statistics In-Reply-To: References: <3d375d730812092355v3c17dbe2i7123dfb1e1c39678@mail.gmail.com> Message-ID: <493F85DD.7040303@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On Wed, Dec 10, 2008 at 12:55 AM, > wrote: > > On Wed, Dec 10, 2008 at 01:49, Charles R Harris > > wrote: > > Hi All, > > > > I bumped into this while searching for something else: > > http://www.ohloh.net/p/numpy/analyses/latest > > -14 lines of Javascript? > > > Well, they have scipy mostly written in C++ and davidc as a C > developer with a 29000 line commit ;) C++ in scipy mostly is generated code (sparsetools) + blitz. There is also the problem of code reformating: for example, ohloh seems to believe I am an advanced Fortran developer from scipy, whereas I barely know how to code an hello world; I guess this is because of my removal of arpack while the license issue was discussed and solved. IIRC, I did use the svn method to put back the code, so in theory, it should be possible to realize I did not code any of the above. Also, svn is pretty dumb about renaming (it is just an atomic copy + rm), so if you remove a file, I would not be surprised if you become the author of the whole file for svn in that case. I mean, I am far from being the main author of scipy for any meaningful measure of contribution. cheers, David From andrea.gavana at gmail.com Wed Dec 10 07:19:51 2008 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Wed, 10 Dec 2008 12:19:51 +0000 Subject: [Numpy-discussion] Little vectorization help... Message-ID: Hi All, I am tryin to "vectorize" 3 nested for loops but I am not having much success. Here is the code I use: import numpy import numpy.ma as masked grid = numpy.zeros((nx, ny), dtype=numpy.float32) xOut = numpy.zeros((nx, ny), dtype=numpy.float32) yOut = numpy.zeros((nx, ny), dtype=numpy.float32) z = GetCentroids() # Some vector z values prop = GetValue() # Some other vector values NaN = numpy.NaN for I in xrange(1, nx+1): for J in xrange(1, ny+1): theSum = [] for K in xrange(1, nz+1): cellPos = I-1 + nx*(J-1) + nx*ny*(K-1) centroid = z[cellPos] if low <= centroid <= high and actnum[cellPos] > 0: theSum.append(prop[cellPos]) if theSum: grid[I-1, J-1] = sum(theSum)/len(theSum) else: grid[I-1, J-1] = NaN xOut[I-1, J-1], yOut[I-1, J-1] = x[cellPos], y[cellPos] grid = masked.masked_where(numpy.isnan(grid), grid) Some explanation: 1) "z" is a vector of nx*ny*nz components, where nx = 100, ny = 73, nz = 23, which represents 3D hexahedron cell centroids; 2) "prop" is a vector like z, with the same shape, with some floating point values in it; 3) "actnum" is a vector of integers (0 or 1) with the same shape as z, and indicates if a cell should be considered in the loop or not; 4) low and high are 2 floating point values with low < high: if the cell centroid fall between low and high and the cell is active (as stated in "actnum"), then I take the value of "prop" in that cell and I append it to the "theSum" list; 5) At the end of the K loop, I just take an arithmetic mean of the values in "theSum" list. I think I may be able to figure out how to vectorize the part regarding the "grid" variable, but I have no idea on what to do for the xOut and yOut variables, and I need them because I use them in a later call to matplotlib.contourf. If you could drop some hint on how to proceed, ot would be very appreciated. Thank you for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ From andrea.gavana at gmail.com Wed Dec 10 07:21:34 2008 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Wed, 10 Dec 2008 12:21:34 +0000 Subject: [Numpy-discussion] Little vectorization help... Message-ID: Hi All, I am tryin to "vectorize" 3 nested for loops but I am not having much success. Here is the code I use: import numpy import numpy.ma as masked grid = numpy.zeros((nx, ny), dtype=numpy.float32) xOut = numpy.zeros((nx, ny), dtype=numpy.float32) yOut = numpy.zeros((nx, ny), dtype=numpy.float32) z = GetCentroids() # Some vector z values prop = GetValue() # Some other vector values NaN = numpy.NaN for I in xrange(1, nx+1): for J in xrange(1, ny+1): theSum = [] for K in xrange(1, nz+1): cellPos = I-1 + nx*(J-1) + nx*ny*(K-1) centroid = z[cellPos] if low <= centroid <= high and actnum[cellPos] > 0: theSum.append(prop[cellPos]) if theSum: grid[I-1, J-1] = sum(theSum)/len(theSum) else: grid[I-1, J-1] = NaN xOut[I-1, J-1], yOut[I-1, J-1] = x[cellPos], y[cellPos] grid = masked.masked_where(numpy.isnan(grid), grid) Some explanation: 1) "z" is a vector of nx*ny*nz components, where nx = 100, ny = 73, nz = 23, which represents 3D hexahedron cell centroids; 2) "prop" is a vector like z, with the same shape, with some floating point values in it; 3) "actnum" is a vector of integers (0 or 1) with the same shape as z, and indicates if a cell should be considered in the loop or not; 4) low and high are 2 floating point values with low < high: if the cell centroid fall between low and high and the cell is active (as stated in "actnum"), then I take the value of "prop" in that cell and I append it to the "theSum" list; 5) At the end of the K loop, I just take an arithmetic mean of the values in "theSum" list. I think I may be able to figure out how to vectorize the part regarding the "grid" variable, but I have no idea on what to do for the xOut and yOut variables, and I need them because I use them in a later call to matplotlib.contourf. If you could drop some hint on how to proceed, ot would be very appreciated. Thank you for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ From Robert.Conde at sungard.com Wed Dec 10 07:15:37 2008 From: Robert.Conde at sungard.com (Robert.Conde at sungard.com) Date: Wed, 10 Dec 2008 07:15:37 -0500 Subject: [Numpy-discussion] Superclassing numpy.matrix: got an unexpected keyword argument 'dtype' Message-ID: <897C0AFD34144B4F9DB484F132C698AD6F9D7A@VOO-EXCHANGE05.internal.sungard.corp> Hello, I'm using numpy-1.1.1 for Python 2.3. I'm trying to create a class that acts just like the numpy.matrix class with my own added methods and attributes. I want to pass my class a list of custom "instrument" objects and do some math based on these objects to set the matrix. To this end I've done the following: from numpy import matrix class rcMatrix(matrix): def __init__(self,instruments): """Do some calculations and set the values of the matrix.""" self[0,0] = 100 # Just an example self[0,1] = 100 # The real init method self[1,0] = 200 # Does some math based on the input objects self[1,1] = 300 # def __new__(cls,instruments): """When creating a new instance begin by creating an NxN matrix of zeroes.""" len_ = len(instruments) return matrix.__new__(cls,[[0.0]*len_]*len_) It works great and I can, for example, multiply two of my custom matrices seamlessly. I can also get the transpose. However, when I try to get the inverse I get an error: > rcm = rcMatrix(['instrument1','instrument2']) > print rcm [[ 100. 100.] [ 200. 300.]] > print rcm.T [[ 100. 200.] [ 100. 300.]] > print [5,10] * rcm [[ 2500. 3500.]] > print rcm.I Traceback (most recent call last): File "[Standard]/deleteme", line 29, in ? File "C:\Python23\Lib\site-packages\numpy\core\defmatrix.py", line 492, in getI return asmatrix(func(self)) File "C:\Python23\Lib\site-packages\numpy\core\defmatrix.py", line 52, in asmatrix return matrix(data, dtype=dtype, copy=False) TypeError: __init__() got an unexpected keyword argument 'dtype' I've had to overwrite the getI function in order for things to work out: def getI(self): return matrix(self.tolist()).I I = property(getI, None, doc="inverse") Is this the correct way to achieve my goals? Please let me know if anything is unclear. Thanks, Robert Conde From lists_ravi at lavabit.com Wed Dec 10 11:28:29 2008 From: lists_ravi at lavabit.com (Ravi) Date: Wed, 10 Dec 2008 11:28:29 -0500 Subject: [Numpy-discussion] access ndarray in C++ In-Reply-To: <480F9287.7050405@noaa.gov> References: <200804231530.25141.lists@informa.tiker.net> <480F9287.7050405@noaa.gov> Message-ID: <200812101128.29986.lists_ravi@lavabit.com> On Wednesday 23 April 2008 15:48:23 Christopher Barker wrote: > > - Boost Python [1]. Especially if you want usable C++ integration. (ie. > > more than basic templates, etc.) > > What's the status of the Boost array object? maintained? updated for > recent numpy? The boost.python array object is still maintained. However, it has a few problems: 1. All array operations go through python which makes it too slow for my purposes. Phil Austin posted an alternate class on this list which works well since it uses the numpy C API: http://www.eos.ubc.ca/research/clouds/software/pythonlibs/num_util/num_util_release2/Readme.html 2. Only numeric & numarray are supported out of the box, but it is simple to support numpy; just add the following after calling import_array in your extension module: boost::python::numeric::array::set_module_and_type( "numpy", "ndarray" ); 3. If you want the C++-way of dealing with numpy matrices & vectors directly as objects look at either of the following: http://mail.python.org/pipermail/cplusplus-sig/2008-October/013825.html http://mathema.tician.de/software/pyublas Of course, I am biased towards the first approach. Regards, Ravi From pgmdevlist at gmail.com Wed Dec 10 11:32:50 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 10 Dec 2008 11:32:50 -0500 Subject: [Numpy-discussion] Superclassing numpy.matrix: got an unexpected keyword argument 'dtype' In-Reply-To: <897C0AFD34144B4F9DB484F132C698AD6F9D7A@VOO-EXCHANGE05.internal.sungard.corp> References: <897C0AFD34144B4F9DB484F132C698AD6F9D7A@VOO-EXCHANGE05.internal.sungard.corp> Message-ID: <95651C03-351B-4192-9ED3-7B3C2D1C0FE3@gmail.com> Robert, Transforming your matrix to a list before computation isn't very efficient. If you do need some extra parameters in your __init__ to be compatible with other functions such as asmatrix, well, just add them, or use a coverall **kwargs def __init__(self, instruments, **kwargs) No guarantee it'll work all the time. Otherwise, please have a look at: http://docs.scipy.org/doc/numpy/user/basics.subclassing.html and the other link at the top of that page. In your case, I'd try to put the initialization in the __array_finalize__. On Dec 10, 2008, at 7:15 AM, wrote: > Hello, > > I'm using numpy-1.1.1 for Python 2.3. I'm trying to create a class > that acts just like the numpy.matrix class with my own added methods > and attributes. I want to pass my class a list of custom > "instrument" objects and do some math based on these objects to set > the matrix. To this end I've done the following: > > from numpy import matrix > > class rcMatrix(matrix): > def __init__(self,instruments): > """Do some calculations and set the values of the matrix.""" > self[0,0] = 100 # Just an example > self[0,1] = 100 # The real init method > self[1,0] = 200 # Does some math based on the input objects > self[1,1] = 300 # > def __new__(cls,instruments): > """When creating a new instance begin by creating an NxN > matrix of > zeroes.""" > len_ = len(instruments) > return matrix.__new__(cls,[[0.0]*len_]*len_) > > It works great and I can, for example, multiply two of my custom > matrices seamlessly. I can also get the transpose. However, when I > try to get the inverse I get an error: > >> rcm = rcMatrix(['instrument1','instrument2']) >> print rcm > [[ 100. 100.] > [ 200. 300.]] >> print rcm.T > [[ 100. 200.] > [ 100. 300.]] >> print [5,10] * rcm > [[ 2500. 3500.]] >> print rcm.I > Traceback (most recent call last): > File "[Standard]/deleteme", line 29, in ? > File "C:\Python23\Lib\site-packages\numpy\core\defmatrix.py", line > 492, in getI > return asmatrix(func(self)) > File "C:\Python23\Lib\site-packages\numpy\core\defmatrix.py", line > 52, in asmatrix > return matrix(data, dtype=dtype, copy=False) > TypeError: __init__() got an unexpected keyword argument 'dtype' > > > > I've had to overwrite the getI function in order for things to work > out: > > def getI(self): return matrix(self.tolist()).I > I = property(getI, None, doc="inverse") > > Is this the correct way to achieve my goals? > > Please let me know if anything is unclear. > > Thanks, > > Robert Conde > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From lists_ravi at lavabit.com Wed Dec 10 11:35:22 2008 From: lists_ravi at lavabit.com (Ravi) Date: Wed, 10 Dec 2008 11:35:22 -0500 Subject: [Numpy-discussion] access ndarray in C++ In-Reply-To: <200812101128.29986.lists_ravi@lavabit.com> References: <480F9287.7050405@noaa.gov> <200812101128.29986.lists_ravi@lavabit.com> Message-ID: <200812101135.22541.lists_ravi@lavabit.com> Oops, please ignore my previous message. I just started using a new mail client which marked some of my old messages (which I had tagged interesting) the same as new messages and I just blindly replied to them without checking the date. Sorry about the spam. Ravi From rw247 at astro.columbia.edu Wed Dec 10 11:38:11 2008 From: rw247 at astro.columbia.edu (Ross Williamson) Date: Wed, 10 Dec 2008 11:38:11 -0500 Subject: [Numpy-discussion] Find index of repeated numbers in array Message-ID: <18406573-A84C-40F4-A192-BED57782A294@astro.columbia.edu> Hi Everyone I think I'm missing something really obvious but what I would like to do is extract the indexes from an array where a number matches - For example data = [0,1,2,960,5,6,960,7] I would like to know, for example the indices which match 960 - i.e. it would return 3 and 6 I could do this with a loop but I was wondering if there was a built in numpy function to do this? BTW if anyone is interested I'm converting some idl code to numpy and trying to mmic the IDL function where Cheers Ross From cimrman3 at ntc.zcu.cz Wed Dec 10 11:43:44 2008 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 10 Dec 2008 17:43:44 +0100 Subject: [Numpy-discussion] Find index of repeated numbers in array In-Reply-To: <18406573-A84C-40F4-A192-BED57782A294@astro.columbia.edu> References: <18406573-A84C-40F4-A192-BED57782A294@astro.columbia.edu> Message-ID: <493FF1C0.4080704@ntc.zcu.cz> Ross Williamson wrote: > Hi Everyone > > I think I'm missing something really obvious but what I would like to > do is extract the indexes from an array where a number matches - For > example > > data = [0,1,2,960,5,6,960,7] > > I would like to know, for example the indices which match 960 - i.e. > it would return 3 and 6 import numpy as np In[14]: np.where( np.array( data ) == 960 ) Out[14]: (array([3, 6]),) If you need to count all of the items, try something like np.histogram( data, np.max( data ) ) cheers, r. From nwagner at iam.uni-stuttgart.de Wed Dec 10 11:47:13 2008 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 10 Dec 2008 17:47:13 +0100 Subject: [Numpy-discussion] Find index of repeated numbers in array In-Reply-To: <18406573-A84C-40F4-A192-BED57782A294@astro.columbia.edu> References: <18406573-A84C-40F4-A192-BED57782A294@astro.columbia.edu> Message-ID: On Wed, 10 Dec 2008 11:38:11 -0500 Ross Williamson wrote: > Hi Everyone > > I think I'm missing something really obvious but what I >would like to > do is extract the indexes from an array where a number >matches - For > example > > data = [0,1,2,960,5,6,960,7] > > I would like to know, for example the indices which >match 960 - i.e. > it would return 3 and 6 > > I could do this with a loop but I was wondering if there >was a built > in numpy function to do this? > > BTW if anyone is interested I'm converting some idl code >to numpy and > trying to mmic the IDL function where > > Cheers > > Ross > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > >>> data array([ 0, 1, 2, 960, 5, 6, 960, 7]) >>> where(data==960) (array([3, 6]),) Nils From Chris.Barker at noaa.gov Wed Dec 10 12:07:11 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 10 Dec 2008 09:07:11 -0800 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: <60C1C121-2B2E-4264-8BB8-D65EBA91120B@gmail.com> References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> <493EB202.4030308@noaa.gov> <60C1C121-2B2E-4264-8BB8-D65EBA91120B@gmail.com> Message-ID: <493FF73F.4040300@noaa.gov> Pierre GM wrote: >>> in the same place in NumPy; and all the SciPy IO code to be in the >>> same place in SciPy. >> +1 > > So, no problem w/ importing numpy.ma and numpy.records in numpy.lib.io ? As long as numpy.ma and numpy.records are, and will remain, part of the standard numpy distribution, this is fine. This is a key issue -- what is "core" numpy and what is not, but I know I'd like to see a lot of things built on ma and records, both, so I think they do belong in core. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From sturla at molden.no Wed Dec 10 11:47:42 2008 From: sturla at molden.no (Sturla Molden) Date: Wed, 10 Dec 2008 17:47:42 +0100 Subject: [Numpy-discussion] Find index of repeated numbers in array In-Reply-To: <18406573-A84C-40F4-A192-BED57782A294@astro.columbia.edu> References: <18406573-A84C-40F4-A192-BED57782A294@astro.columbia.edu> Message-ID: <493FF2AE.9000107@molden.no> On 12/10/2008 5:38 PM, Ross Williamson wrote: > Hi Everyone > > I think I'm missing something really obvious but what I would like to > do is extract the indexes from an array where a number matches - For > example > > data = [0,1,2,960,5,6,960,7] > > I would like to know, for example the indices which match 960 - i.e. > it would return 3 and 6 >>> import numpy >>> a = numpy.array([0,1,2,960,5,6,960,7]) >>> a == 960 array([False, False, False, True, False, False, True, False], dtype=bool) >>> idx, = numpy.where(a == 960) >>> idx array([3, 6]) >>> idx.tolist() [3, 6] Sturla Molden From dpeterson at enthought.com Wed Dec 10 12:28:45 2008 From: dpeterson at enthought.com (Dave Peterson) Date: Wed, 10 Dec 2008 11:28:45 -0600 Subject: [Numpy-discussion] ANNOUNCE: ETS 3.1.0 released! Message-ID: <493FFC4D.60500@enthought.com> I'm pleased to announce that the Enthought Tool Suite (ETS) 3.1.0 has been tagged, released, and uploaded to PyPi[1]! Both source distributions (.tar.gz) and binary (.egg) for Windows have been built and uploaded to PyPi. You can update an existing ETS install to v3.1.0 like so: easy_install -U ETS==3.1.0 What is ETS? ------------------ The Enthought Tool Suite (ETS) is a collection of projects developed by members of the OSS community, including Enthought employees, which we use every day to construct custom scientific applications. It includes a wide variety of components, including: * an extensible application framework * application building blocks * 2-D and 3-D graphics libraries * scientific and math libraries * developer tools The cornerstone on which these tools rest is the Traits project, which provides explicit type declarations in Python; its features include initialization, validation, delegation, notification, and visualization of typed attributes. More information is available for all these packages from the Enthought Tool Suite development home page: http://code.enthought.com/projects/index.php -- Dave From f.yw at hotmail.com Wed Dec 10 12:56:39 2008 From: f.yw at hotmail.com (frank wang) Date: Wed, 10 Dec 2008 10:56:39 -0700 Subject: [Numpy-discussion] how do I delete unused matrix to save the memory? In-Reply-To: <3d375d730812091903x499672c0k92c2db5134c79c10@mail.gmail.com> References: <73A4847E-7607-4178-8375-DA17596F05FF@gmail.com> <15FB9115-2D05-4273-A9C6-7573C48A65D3@gmail.com> <3d375d730812091903x499672c0k92c2db5134c79c10@mail.gmail.com> Message-ID: I am running in ipython. Now I do not have the problem anymore. %reset commands is a good solution. Thanks Frank> Date: Tue, 9 Dec 2008 21:03:00 -0600> From: robert.kern at gmail.com> To: numpy-discussion at scipy.org> Subject: Re: [Numpy-discussion] how do I delete unused matrix to save the memory?> > On Mon, Dec 8, 2008 at 19:15, frank wang wrote:> > Hi,> >> > I have a program with some variables consume a lot of memory. The first time> > I run it, it is fine. The second time I run it, I will get MemoryError. If I> > close the ipython and reopen it again, then I can run the program once. I am> > looking for a command to delete the intermediate variable once it is not> > used to save memory like in matlab clear command.> > How are you running this program? Be aware that IPython may be holding> on to objects and preventing them from being deallocated. For example:> > In [7]: !cat memtest.py> class A(object):> def __del__(self):> print 'Deleting %r' % self> > > a = A()> > In [8]: %run memtest.py> > In [9]: %run memtest.py> > In [10]: %run memtest.py> > In [11]: del a> > In [12]:> Do you really want to exit ([y]/n)?> > $ python memtest.py> Deleting <__main__.A object at 0x915ab0>> > > You can remove some of these references with %reset and maybe a> gc.collect() for good measure.> > > In [1]: %run memtest> > In [2]: %run memtest> > In [3]: %run memtest> > In [4]: %reset> Once deleted, variables cannot be recovered. Proceed (y/[n])? y> Deleting <__main__.A object at 0xf3e950>> Deleting <__main__.A object at 0xf3e6d0>> Deleting <__main__.A object at 0xf3e930>> > -- > Robert Kern> > "I have come to believe that the whole world is an enigma, a harmless> enigma that is made terrible by our own mad attempt to interpret it as> though it had an underlying truth."> -- Umberto Eco> _______________________________________________> Numpy-discussion mailing list> Numpy-discussion at scipy.org> http://projects.scipy.org/mailman/listinfo/numpy-discussion _________________________________________________________________ Send e-mail faster without improving your typing skills. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_speed_122008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From f.yw at hotmail.com Wed Dec 10 13:00:19 2008 From: f.yw at hotmail.com (frank wang) Date: Wed, 10 Dec 2008 11:00:19 -0700 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <3d375d730812091910vfe2da77s6b8afd8ff693ee80@mail.gmail.com> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> <493E85FF.2060106@gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> <3d375d730812091910vfe2da77s6b8afd8ff693ee80@mail.gmail.com> Message-ID: On my two systems with Intel Core2 DUO, finfo(float128) gives me the nameerro, "NameError: name 'float128' is not defined". Why? Thanks Frank> Date: Tue, 9 Dec 2008 21:10:32 -0600> From: robert.kern at gmail.com> To: numpy-discussion at scipy.org> Subject: Re: [Numpy-discussion] Importance of order when summing values in anarray> > On Tue, Dec 9, 2008 at 21:01, Charles R Harris> wrote:> >> >> > On Tue, Dec 9, 2008 at 1:40 PM, Robert Kern wrote:> >>> >> On Tue, Dec 9, 2008 at 09:51, Nadav Horesh wrote:> >> > As much as I know float128 are in fact 80 bits (64 mantissa + 16> >> > exponent) so the precision is 18-19 digits (not 34)> >>> >> float128 should be 128 bits wide. If it's not on your platform, please> >> let us know as that is a bug in your build.> >> > I think he means the actual precision is the ieee extended precision, the> > number just happens to be stored into larger chunks of memory for alignment> > purposes.> > Ah, that's good to know. Yes, float128 on my Intel Mac behaves this way.> > In [12]: f = finfo(float128)> > In [13]: f.nmant> Out[13]: 63> > In [14]: f.nexp> Out[14]: 15> > -- > Robert Kern> > "I have come to believe that the whole world is an enigma, a harmless> enigma that is made terrible by our own mad attempt to interpret it as> though it had an underlying truth."> -- Umberto Eco> _______________________________________________> Numpy-discussion mailing list> Numpy-discussion at scipy.org> http://projects.scipy.org/mailman/listinfo/numpy-discussion _________________________________________________________________ You live life online. So we put Windows on the web. http://clk.atdmt.com/MRT/go/127032869/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Dec 10 13:07:58 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 10 Dec 2008 11:07:58 -0700 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <710F2847B0018641891D9A216027636029C36D@ex3.envision.co.il> <493E85FF.2060106@gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> <3d375d730812091910vfe2da77s6b8afd8ff693ee80@mail.gmail.com> Message-ID: On Wed, Dec 10, 2008 at 11:00 AM, frank wang wrote: > On my two systems with Intel Core2 DUO, finfo(float128) gives me the > nameerro, "NameError: name 'float128' is not defined". Why? > > You probably run a 32 bit OS. IEEE extended precision is 80 bits. On 32 bit systems it fits in three 32 bit words and shows up as float96. On 64 bit systems it fits in two 64 bit words and shows up as float128. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rw247 at astro.columbia.edu Wed Dec 10 13:16:29 2008 From: rw247 at astro.columbia.edu (Ross Williamson) Date: Wed, 10 Dec 2008 13:16:29 -0500 Subject: [Numpy-discussion] Find index of repeated numbers in array In-Reply-To: <493FF2AE.9000107@molden.no> References: <18406573-A84C-40F4-A192-BED57782A294@astro.columbia.edu> <493FF2AE.9000107@molden.no> Message-ID: Thanks all I was being dumb and forgot to initialize as array() Cheers Ross On Dec 10, 2008, at 11:47 AM, Sturla Molden wrote: > On 12/10/2008 5:38 PM, Ross Williamson wrote: >> Hi Everyone >> >> I think I'm missing something really obvious but what I would like to >> do is extract the indexes from an array where a number matches - For >> example >> >> data = [0,1,2,960,5,6,960,7] >> >> I would like to know, for example the indices which match 960 - i.e. >> it would return 3 and 6 > >>>> import numpy >>>> a = numpy.array([0,1,2,960,5,6,960,7]) >>>> a == 960 > array([False, False, False, True, False, False, True, False], > dtype=bool) >>>> idx, = numpy.where(a == 960) >>>> idx > array([3, 6]) >>>> idx.tolist() > [3, 6] > > > Sturla Molden > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Wed Dec 10 13:58:05 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 10 Dec 2008 12:58:05 -0600 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <493E85FF.2060106@gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> <3d375d730812091910vfe2da77s6b8afd8ff693ee80@mail.gmail.com> Message-ID: <3d375d730812101058l205b461bp47df4d1044b3f931@mail.gmail.com> On Wed, Dec 10, 2008 at 12:07, Charles R Harris wrote: > > > On Wed, Dec 10, 2008 at 11:00 AM, frank wang wrote: >> >> On my two systems with Intel Core2 DUO, finfo(float128) gives me the >> nameerro, "NameError: name 'float128' is not defined". Why? >> > > You probably run a 32 bit OS. IEEE extended precision is 80 bits. On 32 bit > systems it fits in three 32 bit words and shows up as float96. On 64 bit > systems it fits in two 64 bit words and shows up as float128. I'm running a 32-bit OS (well, a 32-bit build of Python on OS X) on an Intel Core2 Duo, and I get a float128. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From michael.s.gilbert at gmail.com Wed Dec 10 14:03:39 2008 From: michael.s.gilbert at gmail.com (Michael Gilbert) Date: Wed, 10 Dec 2008 14:03:39 -0500 Subject: [Numpy-discussion] On the quality of the numpy.random.normal() distribution Message-ID: <8e2a98be0812101103k77bb7988m428a5afc7951442b@mail.gmail.com> Hello, I have been reading that there may be potential issues with the Box-Muller transform, which is used by the numpy.random.normal() function. Supposedly, since f*x1 and f*x2 are not independent variables, then the individual elements (corresponding to f*x1 and f*x2 ) of the distribution also won't be independent. For example, see "Stochastic Simulation" by Ripley, pages 54-59, where the random values end up distributed on a spiral. Note that they mention that they only looked at "congruential generators." Is the random number generator used by numpy congruential? I have tried to generate plots that demonstrate this problem, but have come up short. For example: import numpy , pylab nsamples = 10**6 n = numpy.random.normal( 0.0 , 1.0 , nsamples ) pylab.scatter( n[0:-1:2] , n[1:-1:2] , 0.1 ) pylab.show() I can zoom in and out, and the scatter still looks random (white noise -- almost like tv static). Does this prove that there is no problem? And if so, why does numpy do a better job than as demonstrated by Ripley? Regards, Mike Gilbert From matthieu.brucher at gmail.com Wed Dec 10 14:13:52 2008 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Wed, 10 Dec 2008 20:13:52 +0100 Subject: [Numpy-discussion] On the quality of the numpy.random.normal() distribution In-Reply-To: <8e2a98be0812101103k77bb7988m428a5afc7951442b@mail.gmail.com> References: <8e2a98be0812101103k77bb7988m428a5afc7951442b@mail.gmail.com> Message-ID: I think the use of a correct uniform generator will allow a good normal distribution. Congruental generators are very basic generators, everyone knows they should not be used. I think Numpy uses a Mersenne Twisted generator, for which you can generate "independant" vectors with several hundred values. Matthieu 2008/12/10 Michael Gilbert : > Hello, > > I have been reading that there may be potential issues with the > Box-Muller transform, which is used by the numpy.random.normal() > function. Supposedly, since f*x1 and f*x2 are not independent variables, then > the individual elements (corresponding to f*x1 and f*x2 ) of the > distribution also won't be independent. For example, see "Stochastic > Simulation" by Ripley, pages 54-59, where the random values end up > distributed on a spiral. Note that they mention that they only looked > at "congruential generators." Is the random number generator used > by numpy congruential? > > I have tried to generate plots that demonstrate this problem, but have > come up short. For example: > > import numpy , pylab > nsamples = 10**6 > n = numpy.random.normal( 0.0 , 1.0 , nsamples ) > pylab.scatter( n[0:-1:2] , n[1:-1:2] , 0.1 ) > pylab.show() > > I can zoom in and out, and the scatter still looks random (white > noise -- almost like tv static). Does this prove that there is no > problem? And if so, why does numpy do a better job than as > demonstrated by Ripley? > > Regards, > Mike Gilbert > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From pav at iki.fi Wed Dec 10 14:23:00 2008 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 10 Dec 2008 19:23:00 +0000 (UTC) Subject: [Numpy-discussion] On the quality of the numpy.random.normal() distribution References: <8e2a98be0812101103k77bb7988m428a5afc7951442b@mail.gmail.com> Message-ID: Wed, 10 Dec 2008 14:03:39 -0500, Michael Gilbert wrote: > I have been reading that there may be potential issues with the > Box-Muller transform, which is used by the numpy.random.normal() > function. Supposedly, since f*x1 and f*x2 are not independent > variables, then the individual elements (corresponding to f*x1 and f*x2 > ) of the distribution also won't be independent. For example, see > "Stochastic Simulation" by Ripley, pages 54-59, where the random values > end up distributed on a spiral. Note that they mention that they only > looked at "congruential generators." Is the random number generator > used by numpy congruential? I'm not an expert, but the generator used by Numpy is the Mersenne twister, which should be quite good for many uses. I'd guess what you mention is a way to illustrate that the output of linear congruental generators has serial correlations. At least according to wikipedia, these are negligible in Mersenne twister's output. -- Pauli Virtanen From charlesr.harris at gmail.com Wed Dec 10 14:33:33 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 10 Dec 2008 12:33:33 -0700 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <3d375d730812101058l205b461bp47df4d1044b3f931@mail.gmail.com> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> <3d375d730812091910vfe2da77s6b8afd8ff693ee80@mail.gmail.com> <3d375d730812101058l205b461bp47df4d1044b3f931@mail.gmail.com> Message-ID: On Wed, Dec 10, 2008 at 11:58 AM, Robert Kern wrote: > On Wed, Dec 10, 2008 at 12:07, Charles R Harris > wrote: > > > > > > On Wed, Dec 10, 2008 at 11:00 AM, frank wang wrote: > >> > >> On my two systems with Intel Core2 DUO, finfo(float128) gives me the > >> nameerro, "NameError: name 'float128' is not defined". Why? > >> > > > > You probably run a 32 bit OS. IEEE extended precision is 80 bits. On 32 > bit > > systems it fits in three 32 bit words and shows up as float96. On 64 bit > > systems it fits in two 64 bit words and shows up as float128. > > I'm running a 32-bit OS (well, a 32-bit build of Python on OS X) on an > Intel Core2 Duo, and I get a float128. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Dec 10 14:38:59 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 10 Dec 2008 12:38:59 -0700 Subject: [Numpy-discussion] Importance of order when summing values in anarray In-Reply-To: <3d375d730812101058l205b461bp47df4d1044b3f931@mail.gmail.com> References: <789d27b10812090607k6b0962dfn4a5af294641a9dda@mail.gmail.com> <789d27b10812090700g4c3ec99esed5055c461b95609@mail.gmail.com> <493E92BB.9060403@gmail.com> <710F2847B0018641891D9A216027636029C36E@ex3.envision.co.il> <3d375d730812091240p552713e7r3109c58fa7a3d8@mail.gmail.com> <3d375d730812091910vfe2da77s6b8afd8ff693ee80@mail.gmail.com> <3d375d730812101058l205b461bp47df4d1044b3f931@mail.gmail.com> Message-ID: On Wed, Dec 10, 2008 at 11:58 AM, Robert Kern wrote: > On Wed, Dec 10, 2008 at 12:07, Charles R Harris > wrote: > > > > > > On Wed, Dec 10, 2008 at 11:00 AM, frank wang wrote: > >> > >> On my two systems with Intel Core2 DUO, finfo(float128) gives me the > >> nameerro, "NameError: name 'float128' is not defined". Why? > >> > > > > You probably run a 32 bit OS. IEEE extended precision is 80 bits. On 32 > bit > > systems it fits in three 32 bit words and shows up as float96. On 64 bit > > systems it fits in two 64 bit words and shows up as float128. > > I'm running a 32-bit OS (well, a 32-bit build of Python on OS X) on an > Intel Core2 Duo, and I get a float128. > Curious. It probably has something to do with the way the FPU is set up when running on a 64 bit system that is independent of how python is compiled. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From spacey-numpy-discussion at lenin.net Wed Dec 10 14:48:48 2008 From: spacey-numpy-discussion at lenin.net (Peter Norton) Date: Wed, 10 Dec 2008 14:48:48 -0500 Subject: [Numpy-discussion] Numscons issues: numpy.core.umath_tests not built, built-in ld detection, MAIN__ not being set-up In-Reply-To: <493F5AC5.5000902@ar.media.kyoto-u.ac.jp> References: <493F5AC5.5000902@ar.media.kyoto-u.ac.jp> Message-ID: On Wed, Dec 10, 2008 at 12:59 AM, David Cournapeau wrote: > Peter Norton wrote: >> I've got a few issues that I hope won't be overwhelming on one message: >> >> (1) Because of some issues in the past in building numpy with >> numscons, the numpy.core.umath_tests don't get built with >> numpy+numscons (at least not as of svn version 6128). >> >> $ python -c 'import numpy; print numpy.__version__; import >> numpy.core.umath_tests' >> 1.3.0.dev6139 >> Traceback (most recent call last): >> File "", line 1, in >> ImportError: No module named umath_tests >> >> What needs to be done to get this module incorporated into the numscons build? > > you should not need this module, it is not built using the normal build > of numpy either. Did you do a clean build (rm -rf build and removing the > install directory first) ? It was enabled before but is commented out ATM. Our users would like to have this module for testing purposes I believe. It should be enabled. - Hide quoted text - >> (2) I've found that in numscons-0.9.4, the detection of the correct >> linker assumes that if gcc is in use, the linker is gnu ld. However, >> on solaris this isn't the recommended toolchain, so it's typical to >> build gcc with gnu as and the solaris /usr/ccs/bin/ld under the hood. >> What this means is that when setting a run_path in the binary (which >> we need to do) the linker flags are set to "-Wl,-rpath=". >> However, this isn't valid for the solaris ld. It needs -R, or >> -Wl,-R. I'm pretty sure that on Solaris trying to link a >> library with -Wl,-rpath= and looking for an error should be enough to >> determine the correct format for the linker. > > Scons and hence numscons indeed assume that the linker is the same as > the compiler by default. It would be possible to avoid this by detecting > the linker at runtime, to bypass scons tools choice, like I do for C, > C++ and Fortran compilers. The whole scons tools sub-system is > unfortunately very limited ATM, so there is a lot of manual work to do > (that's actually what most of the code in numscons/core is for). > >> (3) Numscons tries to check for the need for a MAIN__ function when >> linking with gfortran. However, any libraries built with numscons come >> out with an unsatisfied dependency on MAIN__. The log looks like this >> in build/scons/numpy/linalg/config.log looks like this: > > It may be linked to the sun linker problem above. Actually, the dummy > main detection is not used at all for the building - it is necessary to > detect name mangling used by the fortran compiler, but that's it. I > assumed that a dummy main was never needed for shared libraries, but > that assumption may well be ill founded. > > I never had problems related to this on open solaris, with both native > and gcc toolchains, so I am willing to investiage first whether it is > linked to the sun linker problem or not. > > Unfortunately, I won't have the time to work on this in the next few > months because of my PhD thesis; the sun linker problem can be fixed by > following a strategy similar to compilers, in > numscons/core/initialization.py. You first need to add a detection > scheme for the linker in compiler_detection.py. Thanks, I'll look into this. It is true that working with opensolaris is a lot easier. Sun should have done it years ago. Thanks again, -Peter From charlesr.harris at gmail.com Wed Dec 10 15:30:44 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 10 Dec 2008 13:30:44 -0700 Subject: [Numpy-discussion] On the quality of the numpy.random.normal() distribution In-Reply-To: <8e2a98be0812101103k77bb7988m428a5afc7951442b@mail.gmail.com> References: <8e2a98be0812101103k77bb7988m428a5afc7951442b@mail.gmail.com> Message-ID: On Wed, Dec 10, 2008 at 12:03 PM, Michael Gilbert < michael.s.gilbert at gmail.com> wrote: > Hello, > > I have been reading that there may be potential issues with the > Box-Muller transform, which is used by the numpy.random.normal() > function. Supposedly, since f*x1 and f*x2 are not independent variables, > then > the individual elements (corresponding to f*x1 and f*x2 ) of the > distribution also won't be independent. For example, see "Stochastic > Simulation" by Ripley, pages 54-59, where the random values end up > distributed on a spiral. Note that they mention that they only looked > at "congruential generators." Is the random number generator used > by numpy congruential? > > I have tried to generate plots that demonstrate this problem, but have > come up short. For example: > > import numpy , pylab > nsamples = 10**6 > n = numpy.random.normal( 0.0 , 1.0 , nsamples ) > pylab.scatter( n[0:-1:2] , n[1:-1:2] , 0.1 ) > pylab.show() > > I can zoom in and out, and the scatter still looks random (white > noise -- almost like tv static). Does this prove that there is no > problem? And if so, why does numpy do a better job than as > demonstrated by Ripley? > Bruce Carneal did some tests of robustness and speed for various normal generators. I don't know what his final tests showed for Box-Muller. IIRC, it had some failures but nothing spectacular. The tests were pretty stringent and based on using the erf to turn the normal distribution into a uniform distribution and using the crush tests on the latter.. You could send him a note and ask: bcarneal at gmail.com. Here are the timings he got: In what follows the uniform variate generators are: lcg64 mwc8222 mt19937 mt19937_64 yarn5 And the normal distribution codes are: trng - default normal distribution code in TRNG boxm - Box-Muller, mtrand lookalike, remembers/uses 2nd value zig7 - a 'Harris' ziggurat indexed by 7 bits zig8 - a 'Harris' ziggurat indexed by 8 bits zig9 - a 'Harris' ziggurat indexed by 9 bits Here are the numbers in more detail: # Timings from icc -O2 running on 2.4GhZ Core-2 lcg64 trng: 6.52459e+06 ops per second lcg64 boxm: 2.18453e+07 ops per second lcg64 zig7: 1.80616e+08 ops per second lcg64 zig8: 2.01865e+08 ops per second lcg64 zig9: 2.05156e+08 ops per second mwc8222 trng: 6.52459e+06 ops per second mwc8222 boxm: 2.08787e+07 ops per second mwc8222 zig7: 9.44663e+07 ops per second mwc8222 zig8: 1.05326e+08 ops per second mwc8222 zig9: 1.03478e+08 ops per second mt19937 trng: 6.41112e+06 ops per second mt19937 boxm: 1.64986e+07 ops per second mt19937 zig7: 4.23762e+07 ops per second mt19937 zig8: 4.52623e+07 ops per second mt19937 zig9: 4.52623e+07 ops per second mt19937_64 trng: 6.42509e+06 ops per second mt19937_64 boxm: 1.93226e+07 ops per second mt19937_64 zig7: 5.8762e+07 ops per second mt19937_64 zig8: 6.17213e+07 ops per second mt19937_64 zig9: 6.29146e+07 ops per second yarn5 trng: 5.95781e+06 ops per second yarn5 boxm: 1.19156e+07 ops per second yarn5 zig7: 1.48945e+07 ops per second yarn5 zig8: 1.54809e+07 ops per second yarn5 zig9: 1.53201e+07 ops per second # Timings from g++ -O2 running on a 2.4GhZ Core-2 lcg64 trng: 6.72163e+06 ops per second lcg64 boxm: 1.50465e+07 ops per second lcg64 zig7: 1.31072e+08 ops per second lcg64 zig8: 1.48383e+08 ops per second lcg64 zig9: 1.6036e+08 ops per second mwc8222 trng: 6.64215e+06 ops per second mwc8222 boxm: 1.44299e+07 ops per second mwc8222 zig7: 8.903e+07 ops per second mwc8222 zig8: 1.00825e+08 ops per second mwc8222 zig9: 1.03478e+08 ops per second mt19937 trng: 6.52459e+06 ops per second mt19937 boxm: 1.28223e+07 ops per second mt19937 zig7: 5.00116e+07 ops per second mt19937 zig8: 5.41123e+07 ops per second mt19937 zig9: 5.47083e+07 ops per second mt19937_64 trng: 6.58285e+06 ops per second mt19937_64 boxm: 1.42988e+07 ops per second mt19937_64 zig7: 6.72164e+07 ops per second mt19937_64 zig8: 7.39591e+07 ops per second mt19937_64 zig9: 7.46022e+07 ops per second yarn5 trng: 6.25144e+06 ops per second yarn5 boxm: 8.93672e+06 ops per second yarn5 zig7: 1.50465e+07 ops per second yarn5 zig8: 1.57496e+07 ops per second yarn5 zig9: 1.56038e+07 ops per second Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From elfnor at gmail.com Wed Dec 10 15:54:13 2008 From: elfnor at gmail.com (Elfnor) Date: Wed, 10 Dec 2008 12:54:13 -0800 (PST) Subject: [Numpy-discussion] rollaxis and reshape Message-ID: <20943690.post@talk.nabble.com> Hi I'm trying to split an array into two pieces and have the two pieces in a new dimension. Here it is in code, because that's hard to explain in words. >>>data.shape (4, 50, 3) >>>new_data = numpy.zeros((2, 4, 25, 3)) >>>new_data[0,...] = data[:,:25,:] >>>new_data[1,...] = data[:,25:,:] >>>new_data.shape (2, 4, 25, 3) That works but when I try it with reshape the elements get in the wrong place. I've tried various combinations of rollaxis before the reshape, but can't get it right. Thanks Eleanor -- View this message in context: http://www.nabble.com/rollaxis-and-reshape-tp20943690p20943690.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From gael.varoquaux at normalesup.org Wed Dec 10 17:10:23 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 10 Dec 2008 23:10:23 +0100 Subject: [Numpy-discussion] Failing to build numpy properly on Ubuntu Hardy Message-ID: <20081210221023.GD356@phare.normalesup.org> Hi all, Looks like I am following the long line of people failing to build numpy :). I must admit I am clueless with building problems. Numpy builds alright, but I get: ImportError: /usr/lib/sse2/atlas/libblas.so.3gf: undefined symbol: _gfortran_st_write_done On import. This used to work a while ago. I am not sure what I changed, but it sure does fail. I really don't understand where the gfortran comes in. I tried building numpy with or without gfortran. From what I gather it is the numpy is being built by a different compiler than the atlas libraries (hurray for ABI compatibility), but I don't really understand how this is possible. How can I debug this? Cheers, Ga?l From babaktei at yahoo.com Wed Dec 10 17:28:21 2008 From: babaktei at yahoo.com (Bab Tei) Date: Wed, 10 Dec 2008 14:28:21 -0800 (PST) Subject: [Numpy-discussion] Support for sparse matrix in Distance function (and clustering)? Message-ID: <898520.57855.qm@web50403.mail.re2.yahoo.com> Damian Eads soe.ucsc.edu> writes: > > Hi, > > Can you be more specific? Do you need sparse matrices to represent > observation vectors because they are sparse? Or do you need sparse > matrices to represent distance matrices because most vectors you are > clustering are similar while a few are dissimilar? > Damian > > On Tue, Dec 9, 2008 at 1:28 PM, Bab Tei yahoo.com> wrote: > > Hi > > Does the distance function in spatial package support sparse matrix? > > regards Hi I need sparse matrices to represent observation vectors because they are sparse. I have a large sparse matrix. I also use kmeans (Besides hierarchical clustering) which can directly work with very large data. Teimourpour From babaktei at yahoo.com Wed Dec 10 17:30:43 2008 From: babaktei at yahoo.com (Bab Tei) Date: Wed, 10 Dec 2008 14:30:43 -0800 (PST) Subject: [Numpy-discussion] Excluding index in numpy like negative index in R? Message-ID: <789260.71920.qm@web50412.mail.re2.yahoo.com> Keith Goodman gmail.com> writes: > > On Tue, Dec 9, 2008 at 12:25 PM, Bab Tei yahoo.com> wrote: > > > I can exclude a list of items by using negative index in R (R-project) ie myarray[-excludeindex]. As > negative indexing in numpy (And python) behave differently ,how can I exclude a list of item in numpy? > > Here's a painful way to do it: > > >> x = np.array([0,1,2,3,4]) > >> excludeindex = [1,3] > >> idx = list(set(range(4)) - set(excludeindex)) > >> x[idx] > array([0, 2]) > > To make it more painful, you might want to sort idx. > > But if excludeindex is True/False, then just use ~excludeindex. > Thank you. However it seems I have to create a full list at first and then exclude items. It is somehow painful as I have some very large sparse matrices and creating a full index eats a lot of memory. Maybe adding this functionality to numpy saves memory and makes the syntax more clear ie a syntax like x[~excludeindex] which smartly distinguish between excludeindex as a list of numerical indexes and a mask (list of true/false indexes). Regards From gael.varoquaux at normalesup.org Wed Dec 10 17:32:11 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 10 Dec 2008 23:32:11 +0100 Subject: [Numpy-discussion] Failing to build numpy properly on Ubuntu Hardy In-Reply-To: <20081210221023.GD356@phare.normalesup.org> References: <20081210221023.GD356@phare.normalesup.org> Message-ID: <20081210223211.GF356@phare.normalesup.org> On Wed, Dec 10, 2008 at 11:10:23PM +0100, Gael Varoquaux wrote: > Numpy builds alright, but I get: > ImportError: /usr/lib/sse2/atlas/libblas.so.3gf: undefined symbol: > _gfortran_st_write_done Doh! I knew it must be a FAQ, and it was :). Better googling gave me the answer: the configuration was picking up the libraries for the libatlas3gf-sse2 package, which is built with gfortran. Numpy is built with g77, and I need to force it to link with the libraries given by the atlas3-sse2 package (providing libaries built with g77). The best way is simply to remove the gfortran altas libraries. This email from David got me on the track: http://projects.scipy.org/pipermail/numpy-discussion/2008-May/034164.html I must have at some point installed the gfortran libraries by mistake. I was taken by surprise because I didn't expect Ubuntu to have 2 versions of atlas, ABI incompatible. Sorry for the noise. Ga?l From david at ar.media.kyoto-u.ac.jp Wed Dec 10 23:07:51 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 11 Dec 2008 13:07:51 +0900 Subject: [Numpy-discussion] Failing to build numpy properly on Ubuntu Hardy In-Reply-To: <20081210223211.GF356@phare.normalesup.org> References: <20081210221023.GD356@phare.normalesup.org> <20081210223211.GF356@phare.normalesup.org> Message-ID: <49409217.6010905@ar.media.kyoto-u.ac.jp> Gael Varoquaux wrote: > I must have at some point installed the gfortran libraries by mistake. I > was taken by surprise because I didn't expect Ubuntu to have 2 versions > of atlas, ABI incompatible. > The point was to help for transition from g77 to gfortran ABI. Intrepid does not have this problem (they even went as far as removing g77 from the archives !). David From gael.varoquaux at normalesup.org Thu Dec 11 01:01:20 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 11 Dec 2008 07:01:20 +0100 Subject: [Numpy-discussion] Failing to build numpy properly on Ubuntu Hardy In-Reply-To: <49409217.6010905@ar.media.kyoto-u.ac.jp> References: <20081210221023.GD356@phare.normalesup.org> <20081210223211.GF356@phare.normalesup.org> <49409217.6010905@ar.media.kyoto-u.ac.jp> Message-ID: <20081211060120.GB21281@phare.normalesup.org> On Thu, Dec 11, 2008 at 01:07:51PM +0900, David Cournapeau wrote: > Gael Varoquaux wrote: > > I must have at some point installed the gfortran libraries by mistake. I > > was taken by surprise because I didn't expect Ubuntu to have 2 versions > > of atlas, ABI incompatible. > The point was to help for transition from g77 to gfortran ABI. Intrepid > does not have this problem (they even went as far as removing g77 from > the archives !). Sure, I can understand that. I am on intrepid on half of my boxes so far. But not this one :). Ga?l From chaos.proton at gmail.com Thu Dec 11 02:18:05 2008 From: chaos.proton at gmail.com (Grissiom) Date: Thu, 11 Dec 2008 15:18:05 +0800 Subject: [Numpy-discussion] Failing to build numpy properly on Ubuntu Hardy In-Reply-To: <20081210221023.GD356@phare.normalesup.org> References: <20081210221023.GD356@phare.normalesup.org> Message-ID: On Thu, Dec 11, 2008 at 06:10, Gael Varoquaux wrote: > Hi all, > > Looks like I am following the long line of people failing to build numpy > :). I must admit I am clueless with building problems. > > Numpy builds alright, but I get: > > ImportError: /usr/lib/sse2/atlas/libblas.so.3gf: undefined symbol: > _gfortran_st_write_done > > On import. > > This used to work a while ago. I am not sure what I changed, but it sure > does fail. I really don't understand where the gfortran comes in. I tried > building numpy with or without gfortran. From what I gather it is the > numpy is being built by a different compiler than the atlas libraries > (hurray for ABI compatibility), but I don't really understand how this is > possible. > > How can I debug this? > > Cheers, > > Ga?l > I have encountered with such problem before. My solution is recompile the problem package(maybe atlas in your case) with -ff2c option passed to gfortran. -- Cheers, Grissiom -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Thu Dec 11 02:13:15 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 11 Dec 2008 16:13:15 +0900 Subject: [Numpy-discussion] Failing to build numpy properly on Ubuntu Hardy In-Reply-To: References: <20081210221023.GD356@phare.normalesup.org> Message-ID: <4940BD8B.2000001@ar.media.kyoto-u.ac.jp> Grissiom wrote: > On Thu, Dec 11, 2008 at 06:10, Gael Varoquaux > > > wrote: > > Hi all, > > Looks like I am following the long line of people failing to build > numpy > :). I must admit I am clueless with building problems. > > Numpy builds alright, but I get: > > ImportError: /usr/lib/sse2/atlas/libblas.so.3gf: undefined symbol: > _gfortran_st_write_done > > On import. > > This used to work a while ago. I am not sure what I changed, but > it sure > does fail. I really don't understand where the gfortran comes in. > I tried > building numpy with or without gfortran. From what I gather it is the > numpy is being built by a different compiler than the atlas libraries > (hurray for ABI compatibility), but I don't really understand how > this is > possible. > > How can I debug this? > > Cheers, > > Ga?l > > > I have encountered with such problem before. My solution is recompile > the problem package(maybe atlas in your case) with -ff2c option passed > to gfortran. This is a bad idea: it won't work with libraries which are not built with this option, and the error won't always be easy to detect (one key difference is that wo ff2c, complex variables are passed by value by gfortran, whereas they are passed by reference with the ff2c option - which means crash and/or corruption). http://wiki.debian.org/GfortranTransition The only viable solution is to avoid mixing g77-built and gfortran-built libraries (there is now a simple test which tries to detect those mix in both numpy and scipy), cheers, David From chaos.proton at gmail.com Thu Dec 11 02:56:23 2008 From: chaos.proton at gmail.com (Grissiom) Date: Thu, 11 Dec 2008 15:56:23 +0800 Subject: [Numpy-discussion] Failing to build numpy properly on Ubuntu Hardy In-Reply-To: <4940BD8B.2000001@ar.media.kyoto-u.ac.jp> References: <20081210221023.GD356@phare.normalesup.org> <4940BD8B.2000001@ar.media.kyoto-u.ac.jp> Message-ID: On Thu, Dec 11, 2008 at 15:13, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Grissiom wrote: > > I have encountered with such problem before. My solution is recompile > > the problem package(maybe atlas in your case) with -ff2c option passed > > to gfortran. > > This is a bad idea: it won't work with libraries which are not built > with this option, and the error won't always be easy to detect (one key > difference is that wo ff2c, complex variables are passed by value by > gfortran, whereas they are passed by reference with the ff2c option - > which means crash and/or corruption). > > http://wiki.debian.org/GfortranTransition > > The only viable solution is to avoid mixing g77-built and gfortran-built > libraries (there is now a simple test which tries to detect those mix in > both numpy and scipy), > > cheers, > > David > Thanks for pointing out my mistake ;) -- Cheers, Grissiom -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Dec 11 10:20:49 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 11 Dec 2008 16:20:49 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing Message-ID: <20081211152049.GB1440@phare.normalesup.org> Hi there, I have been using the multiprocessing module a lot to do statistical tests such as Monte Carlo or resampling, and I have just discovered something that makes me wonder if I haven't been accumulating false results. Given two files: === test.py === from test_helper import task from multiprocessing import Pool p = Pool(4) jobs = list() for i in range(4): jobs.append(p.apply_async(task, (4, ))) print [j.get() for j in jobs] p.close() p.join() === test_helper.py === import numpy as np def task(x): return np.random.random(x) ======= If I run test.py, I get: [array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.65357725, 0.35649382, 0.02203999, 0.7591353 ])] In other words, the 4 processes give me the same exact results. Now I understand why this is the case: the different instances of the random number generator where created by forking from the same process, so they are exactly the very same object. This is howver a fairly bad trap. I guess other people will fall into it. The take home message is: **call 'numpy.random.seed()' when you are using multiprocessing** I wonder if we can find a way to make this more user friendly? Would be easy, in the C code, to check if the PID has changed, and if so reseed the random number generator? I can open up a ticket for this if people think this is desirable (I think so). On a side note, there are a score of functions in numpy.random with __module__ to None. It makes it inconvenient to use it with multiprocessing (for instance it forced the creation of the 'test_helper' file here). Ga?l From cournape at gmail.com Thu Dec 11 10:57:26 2008 From: cournape at gmail.com (David Cournapeau) Date: Fri, 12 Dec 2008 00:57:26 +0900 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <20081211152049.GB1440@phare.normalesup.org> References: <20081211152049.GB1440@phare.normalesup.org> Message-ID: <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> On Fri, Dec 12, 2008 at 12:20 AM, Gael Varoquaux wrote: > Hi there, > > I have been using the multiprocessing module a lot to do statistical tests > such as Monte Carlo or resampling, and I have just discovered something > that makes me wonder if I haven't been accumulating false results. Given > two files: > > === test.py === > from test_helper import task > from multiprocessing import Pool > > p = Pool(4) > > jobs = list() > for i in range(4): > jobs.append(p.apply_async(task, (4, ))) > > print [j.get() for j in jobs] > > p.close() > p.join() > > === test_helper.py === > import numpy as np > > def task(x): > return np.random.random(x) > > ======= > > If I run test.py, I get: > > [array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ > 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.35773964, > 0.63945684, 0.50855196, 0.08631373]), array([ 0.65357725, 0.35649382, > 0.02203999, 0.7591353 ])] > > In other words, the 4 processes give me the same exact results. Why do you say the results are the same ? They don't look the same to me - only the first three are the same. > Now I understand why this is the case: the different instances of the > random number generator where created by forking from the same process, > so they are exactly the very same object. This is howver a fairly bad > trap. I guess other people will fall into it. I am not sure I am following: the objects in python are not the same if you fork a process, or I don't understand what you mean by same. They may be initialized the same way, though. Isn't the problem simply due to seeding from the same value ? For such a tiny problem (4 tasks whose processing time is negligeable), the seed will be the same since the intervals between the sampling will be small. Taking a look at the mtrand code in numpy, if the seed is not given, it is taken from /dev/random if available, or the time clock if not; I don't know what the semantics are for concurrent access to /dev/random (is it gauranteed that two process will get different values from it ?). To confirm this, you could try to use your toy example with 500 jobs instead of 4: in that case, it is unlikely they use the same underlying value as a starting point, even if there is no gurantee on concurrent access of /dev/random. > I wonder if we can find a way to make this more user friendly? Would be > easy, in the C code, to check if the PID has changed, and if so reseed > the random number generator? I can open up a ticket for this if people > think this is desirable (I think so). This sounds like too much magic for a very particular use: there may be cases where you want the same seed in multiple processes (what if you processes are not created from multiprocess, and you want to make sure you have the same seed ?). David From cournape at gmail.com Thu Dec 11 11:04:29 2008 From: cournape at gmail.com (David Cournapeau) Date: Fri, 12 Dec 2008 01:04:29 +0900 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> Message-ID: <5b8d13220812110804v5c06b538l36dd6c626f45dec8@mail.gmail.com> On Fri, Dec 12, 2008 at 12:57 AM, David Cournapeau wrote: > Taking a look at the mtrand code in numpy, if the seed is not given, > it is taken from /dev/random if available, or the time clock if not; I > don't know what the semantics are for concurrent access to /dev/random > (is it gauranteed that two process will get different values from it > ?). > Sorry, the mtrand code use /dev/urandom, not /dev/random, if available. David From michael.s.gilbert at gmail.com Thu Dec 11 11:09:58 2008 From: michael.s.gilbert at gmail.com (Michael Gilbert) Date: Thu, 11 Dec 2008 11:09:58 -0500 Subject: [Numpy-discussion] On the quality of the numpy.random.normal() distribution In-Reply-To: References: <8e2a98be0812101103k77bb7988m428a5afc7951442b@mail.gmail.com> Message-ID: <8e2a98be0812110809i307f4333y43f7e7f9a560a36b@mail.gmail.com> > Bruce Carneal did some tests of robustness and speed for various normal > generators. I don't know what his final tests showed for Box-Muller. IIRC, > it had some failures but nothing spectacular. The tests were pretty > stringent and based on using the erf to turn the normal distribution into a > uniform distribution and using the crush tests on the latter.. You could > send him a note and ask: bcarneal at gmail.com. Here are the timings he got: Thanks for all the insightful replies. This gives me some better confidence in numpy's normal distribution. I will contact Bruce Carneal to get more details. Thanks again, Mike Gilbert From pav at iki.fi Thu Dec 11 11:16:04 2008 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 11 Dec 2008 16:16:04 +0000 (UTC) Subject: [Numpy-discussion] numpy.random and multiprocessing References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> Message-ID: Fri, 12 Dec 2008 00:57:26 +0900, David Cournapeau wrote: [clip] > On Fri, Dec 12, 2008 at 12:20 AM, Gael Varoquaux wrote: > [clip] >> Now I understand why this is the case: the different instances of the >> random number generator where created by forking from the same process, >> so they are exactly the very same object. This is howver a fairly bad >> trap. I guess other people will fall into it. > > I am not sure I am following: the objects in python are not the same if > you fork a process, or I don't understand what you mean by same. They > may be initialized the same way, though. The RandomState object handling numpy.random.random is created (and seeded) at import time. So, an identical generator should be shared by all processes after that. -- Pauli Virtanen From bsouthey at gmail.com Thu Dec 11 11:20:48 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 11 Dec 2008 10:20:48 -0600 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <20081211152049.GB1440@phare.normalesup.org> References: <20081211152049.GB1440@phare.normalesup.org> Message-ID: <49413DE0.4030103@gmail.com> Gael Varoquaux wrote: > Hi there, > > I have been using the multiprocessing module a lot to do statistical tests > such as Monte Carlo or resampling, and I have just discovered something > that makes me wonder if I haven't been accumulating false results. Given > two files: > > === test.py === > from test_helper import task > from multiprocessing import Pool > > p = Pool(4) > > jobs = list() > for i in range(4): > jobs.append(p.apply_async(task, (4, ))) > > print [j.get() for j in jobs] > > p.close() > p.join() > > === test_helper.py === > import numpy as np > > def task(x): > return np.random.random(x) > > ======= > > If I run test.py, I get: > > [array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ > 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.35773964, > 0.63945684, 0.50855196, 0.08631373]), array([ 0.65357725, 0.35649382, > 0.02203999, 0.7591353 ])] > > In other words, the 4 processes give me the same exact results. > > Now I understand why this is the case: the different instances of the > random number generator where created by forking from the same process, > so they are exactly the very same object. This is howver a fairly bad > trap. I guess other people will fall into it. > > The take home message is: > **call 'numpy.random.seed()' when you are using multiprocessing** > > I wonder if we can find a way to make this more user friendly? Would be > easy, in the C code, to check if the PID has changed, and if so reseed > the random number generator? I can open up a ticket for this if people > think this is desirable (I think so). > > On a side note, there are a score of functions in numpy.random with > __module__ to None. It makes it inconvenient to use it with > multiprocessing (for instance it forced the creation of the 'test_helper' > file here). > > Ga?l > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > Part of this is one of the gotcha's of simulation that is not specific to multiprocessing and Python. Just highly likely to occur in your case with multiprocessing but does occur in single processing. As David indicated, many applications use a single source (often computer time) to initialize the pseudo-random generators if an actual seed is not supplied. Depending on the resolution as most require an integer so minor changes may not be sufficient to change the seed. So the same seed will get used if the source has not sufficiently 'advanced' before the next initialization. If you really care about reproducing the streams, you should specify the seed anyhow. Bruce From sturla at molden.no Thu Dec 11 11:23:12 2008 From: sturla at molden.no (Sturla Molden) Date: Thu, 11 Dec 2008 17:23:12 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> Message-ID: <49413E70.2090809@molden.no> On 12/11/2008 4:57 PM, David Cournapeau wrote: > Why do you say the results are the same ? They don't look the same to > me - only the first three are the same. He used the multiprocessing.Pool object. There is a possible race condition here: one or more of the forked processes may be doing nothing. They are all competing for tasks on a queue. It could be avoided by using multiprocessing.Process instead. > I am not sure I am following: the objects in python are not the same > if you fork a process, or I don't understand what you mean by same. > They may be initialized the same way, though. When are they initialized? On import numpy or the first call to numpy.random.random? If they are initialized on the import numpy statement, they are initalized prior to forking and sharing state. This is because his statement 'from test_helper import task' actually triggers the import of numpy, and it occurs prior to any fork. This is also system dependent by the way. On Windows multiprocessing does not fork() and does not produce this problem. Sturla Molden From gael.varoquaux at normalesup.org Thu Dec 11 11:36:47 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 11 Dec 2008 17:36:47 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> Message-ID: <20081211163647.GC1440@phare.normalesup.org> On Fri, Dec 12, 2008 at 12:57:26AM +0900, David Cournapeau wrote: > > [array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ > > 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.35773964, > > 0.63945684, 0.50855196, 0.08631373]), array([ 0.65357725, 0.35649382, > > 0.02203999, 0.7591353 ])] > > In other words, the 4 processes give me the same exact results. > Why do you say the results are the same ? They don't look the same to > me - only the first three are the same. Correct. I wonder why. When I try on my box currently I almost always get the same four. But not all the time. More on that below. > > Now I understand why this is the case: the different instances of the > > random number generator where created by forking from the same process, > > so they are exactly the very same object. This is howver a fairly bad > > trap. I guess other people will fall into it. > I am not sure I am following: the objects in python are not the same > if you fork a process, or I don't understand what you mean by same. > They may be initialized the same way, though. Yes, they are initiate with the same seed value. I call them the same because right after the fork they are. The can evolve separately, though. However our PRNG is completely defined by its seed, AFAIK. > Isn't the problem simply due to seeding from the same value ? For such > a tiny problem (4 tasks whose processing time is negligeable), the > seed will be the same since the intervals between the sampling will be > small. Right, but I found the problem in real code, that was not tiny at all. > Taking a look at the mtrand code in numpy, if the seed is not given, > it is taken from /dev/random if available, or the time clock if not; I > don't know what the semantics are for concurrent access to /dev/random > (is it gauranteed that two process will get different values from it > ?). > To confirm this, you could try to use your toy example with 500 jobs > instead of 4: in that case, it is unlikely they use the same > underlying value as a starting point, even if there is no gurantee on > concurrent access of /dev/random. I found the problem on way bigger code. I have only 8 cpus, so I run 8 jobs, and each job loops on the tasks. I noticed that the variance was much smaller than expected. The jobs take 10 minutes, so you can't call them tiny or fast. The problem indeed appears in production code. The way I interpret this is that the seed is created only at module-import time (this is how I read the code in mtrand.pyx). For all my processes, the seed was created when numpy was imported in the mother process. After the fork, the seed is the same in each process. As a result the entropy of the whole system is clearly not the entropy of 4 independant systems. As you point out the fourth value in my toy example differs from the others, so somehow my picture is not exact. But it remains that the entropy is way too low in my production code. I don't understand why, once in a while, there is a value that is different. That could be because numpy is reimported in the child processes. If I insert a 'time.sleep' in my for loop that spawns the processes, I get significantly higher entropy only if the sleep is around 1 second. Looking at the seed code (rk_randomseed in randomkit.c), it seems that /dev/urandom is not used, contrary to what the random.seed docstring pretends, and what is really used is gettimeofday under windows, and _ftime under Unix. It does seem, though that the milliseconds are used. I must admit I don't fully understand why this happens. I thought that: a) Modules where not reimported with multiprocess, thanks to the fork. If this where true, reading mtrand.pyx, all subprocesses should have the same seed. b) /dev/urandom was used to seed. This seems wrong. Reading the code shows no dev/urandom in the seeding parts. c) milliseconds where used, so we should be rather safe from these race-condition. The code does seem to hint toward that, but if I add a sleep(0.01) to my loop, I don't get enough entropy. I did check that sleep(0.01) was sleeping at least 0.01 seconds. > > I wonder if we can find a way to make this more user friendly? Would be > > easy, in the C code, to check if the PID has changed, and if so reseed > > the random number generator? I can open up a ticket for this if people > > think this is desirable (I think so). > This sounds like too much magic for a very particular use: there may > be cases where you want the same seed in multiple processes (what if > you processes are not created from multiprocess, and you want to make > sure you have the same seed ?). Well, yes, for code that wants to explicitely control the seed, ressed automaticaly would be a problem, and we need to figure out a way to make this deterministic (eg for testing purposes). However, this is a small usecase, and when testing people need to be aware of seeding problems (although they might not understand fork semantics). More and more people are going to be using multiprocessing: it comes with the standard library, and standard boxes nowadays have many cores, and will soon have much more. Resampling and brute-force Monte Carlo techniques are embarrassingly parallel, so people will want to use parallel computing on them. I fear many others are going to fall in this trap. Ga?l From gael.varoquaux at normalesup.org Thu Dec 11 11:39:14 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 11 Dec 2008 17:39:14 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49413E70.2090809@molden.no> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> Message-ID: <20081211163914.GD1440@phare.normalesup.org> On Thu, Dec 11, 2008 at 05:23:12PM +0100, Sturla Molden wrote: > On 12/11/2008 4:57 PM, David Cournapeau wrote: > > Why do you say the results are the same ? They don't look the same to > > me - only the first three are the same. > He used the multiprocessing.Pool object. There is a possible race > condition here: one or more of the forked processes may be doing > nothing. They are all competing for tasks on a queue. It could be > avoided by using multiprocessing.Process instead. No, Pool is what I want, because in my production code I am submitting jobs to that pool. > > I am not sure I am following: the objects in python are not the same > > if you fork a process, or I don't understand what you mean by same. > > They may be initialized the same way, though. > When are they initialized? On import numpy or the first call to > numpy.random.random? mtrand.pyx seems pretty clear about that: on import. > If they are initialized on the import numpy statement, they are > initalized prior to forking and sharing state. This is because his > statement 'from test_helper import task' actually triggers the import > of numpy, and it occurs prior to any fork. This is what I thought too. However, inserting a sleep statement long-enough in my spawning loop recovers entropy. I am confused. Ga?l From gael.varoquaux at normalesup.org Thu Dec 11 11:45:01 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 11 Dec 2008 17:45:01 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49413DE0.4030103@gmail.com> References: <20081211152049.GB1440@phare.normalesup.org> <49413DE0.4030103@gmail.com> Message-ID: <20081211164501.GE1440@phare.normalesup.org> On Thu, Dec 11, 2008 at 10:20:48AM -0600, Bruce Southey wrote: > Part of this is one of the gotcha's of simulation that is not specific > to multiprocessing and Python. Just highly likely to occur in your case > with multiprocessing but does occur in single processing. As David > indicated, many applications use a single source (often computer time) > to initialize the pseudo-random generators if an actual seed is not > supplied. Depending on the resolution as most require an integer so > minor changes may not be sufficient to change the seed. So the same seed > will get used if the source has not sufficiently 'advanced' before the > next initialization. > If you really care about reproducing the streams, you should specify the > seed anyhow. Well, its not about me. I have found this out, now, so I will know. Its about many other people who are going to stumble upon this. I don't think it is a good idea to count on the fact that people will understand-enough these problems not to be fooled by them. We should try to reduce that, as much as possible without adding magic that renders the behavior incomprehensible. Ga?l From gael.varoquaux at normalesup.org Thu Dec 11 11:46:20 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 11 Dec 2008 17:46:20 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <20081211163647.GC1440@phare.normalesup.org> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <20081211163647.GC1440@phare.normalesup.org> Message-ID: <20081211164620.GF1440@phare.normalesup.org> On Thu, Dec 11, 2008 at 05:36:47PM +0100, Gael Varoquaux wrote: > b) /dev/urandom was used to seed. This seems wrong. Reading the code > shows no dev/urandom in the seeding parts. Actually, I am wrong here. dev/urandom is indeed used in 'rk_devfill', used in the seeding routine. It seems this is not enough. Ga?l From sturla at molden.no Thu Dec 11 11:55:58 2008 From: sturla at molden.no (Sturla Molden) Date: Thu, 11 Dec 2008 17:55:58 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <20081211163914.GD1440@phare.normalesup.org> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> Message-ID: <4941461E.5000907@molden.no> On 12/11/2008 5:39 PM, Gael Varoquaux wrote: >>> Why do you say the results are the same ? They don't look the same to >>> me - only the first three are the same. > >> He used the multiprocessing.Pool object. There is a possible race >> condition here: one or more of the forked processes may be doing >> nothing. They are all competing for tasks on a queue. It could be >> avoided by using multiprocessing.Process instead. > > No, Pool is what I want, because in my production code I am submitting > jobs to that pool. Sure, a pool is fine. I was just speculating that one of the four processes in your pool was idle all the time; i.e. that one of the other three got to do the task twice. Therefore you only got three identical results and not four. It depends on how the OS schedules the processes, the number of logical CPUs, etc. You have no control over that. But if you had used N instances of multiprocessing.Pool instead, all N results should have been identical (if the 'random' generator is completely deterministic) - because each process would do the task once. I.e. you only got three indentical results due to a race condition in the task queue. But you don't want similar results do you? So if you remember to seed the random number generators after forking, this race condition should be of no significance. > mtrand.pyx seems pretty clear about that: on import. In which case they are initialized prior to forking. Sturla Molden From gael.varoquaux at normalesup.org Thu Dec 11 11:59:14 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 11 Dec 2008 17:59:14 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <4941461E.5000907@molden.no> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> Message-ID: <20081211165914.GG1440@phare.normalesup.org> On Thu, Dec 11, 2008 at 05:55:58PM +0100, Sturla Molden wrote: > > No, Pool is what I want, because in my production code I am submitting > > jobs to that pool. > Sure, a pool is fine. I was just speculating that one of the four > processes in your pool was idle all the time; i.e. that one of the other > three got to do the task twice. Therefore you only got three identical > results and not four. It depends on how the OS schedules the processes, > the number of logical CPUs, etc. You have no control over that. But if > you had used N instances of multiprocessing.Pool instead, all N results > should have been identical (if the 'random' generator is completely > deterministic) - because each process would do the task once. > I.e. you only got three indentical results due to a race condition in > the task queue. Gotcha! Good explanation. Now I understand better my previous investigation. I think you are completely right. So indeed, as I initialy thought, using multiprocessing without reseeding is going to get you in big trouble (and this is what I experienced in my code). Thanks for the explanation, Ga?l From pav at iki.fi Thu Dec 11 12:03:41 2008 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 11 Dec 2008 17:03:41 +0000 (UTC) Subject: [Numpy-discussion] numpy.random and multiprocessing References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> Message-ID: Thu, 11 Dec 2008 17:55:58 +0100, Sturla Molden wrote: [clip] > Sure, a pool is fine. I was just speculating that one of the four > processes in your pool was idle all the time; i.e. that one of the other > three got to do the task twice. Therefore you only got three identical > results and not four. It depends on how the OS schedules the processes, > the number of logical CPUs, etc. You have no control over that. But if > you had used N instances of multiprocessing.Pool instead, all N results > should have been identical (if the 'random' generator is completely > deterministic) - because each process would do the task once. > > I.e. you only got three indentical results due to a race condition in > the task queue. Exactly, change task_helper.py to ---- import numpy as np def task(x): import os print "Hi, I'm", os.getpid() return np.random.random(x) ---- and note the output ---- Hi, I'm 16197 Hi, I'm 16198 Hi, I'm 16199 Hi, I'm 16199 [ 0.58175647 0.16293922 0.30488182 0.67367263] [ 0.58175647 0.16293922 0.30488182 0.67367263] [ 0.58175647 0.16293922 0.30488182 0.67367263] [ 0.59574921 0.61554857 0.06155764 0.75352295] ---- -- Pauli Virtanen From michael.s.gilbert at gmail.com Thu Dec 11 12:10:03 2008 From: michael.s.gilbert at gmail.com (Michael Gilbert) Date: Thu, 11 Dec 2008 12:10:03 -0500 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> Message-ID: <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> > Exactly, change task_helper.py to > > ---- > import numpy as np > > def task(x): > import os > print "Hi, I'm", os.getpid() > return np.random.random(x) > ---- > > and note the output > > ---- > Hi, I'm 16197 > Hi, I'm 16198 > Hi, I'm 16199 > Hi, I'm 16199 > [ 0.58175647 0.16293922 0.30488182 0.67367263] > [ 0.58175647 0.16293922 0.30488182 0.67367263] > [ 0.58175647 0.16293922 0.30488182 0.67367263] > [ 0.59574921 0.61554857 0.06155764 0.75352295] Shouldn't numpy (and/or multiprocessing) be smart enough to prevent this kind of error? A simple enough solution would be to also include the process id as part of the seed since it appears that the problem only occurs when you have different processes/threads accessing the random number generator at the same time. Regards, Mike From david at ar.media.kyoto-u.ac.jp Thu Dec 11 12:04:30 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 12 Dec 2008 02:04:30 +0900 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> Message-ID: <4941481E.4020800@ar.media.kyoto-u.ac.jp> Michael Gilbert wrote: >> Exactly, change task_helper.py to >> >> ---- >> import numpy as np >> >> def task(x): >> import os >> print "Hi, I'm", os.getpid() >> return np.random.random(x) >> ---- >> >> and note the output >> >> ---- >> Hi, I'm 16197 >> Hi, I'm 16198 >> Hi, I'm 16199 >> Hi, I'm 16199 >> [ 0.58175647 0.16293922 0.30488182 0.67367263] >> [ 0.58175647 0.16293922 0.30488182 0.67367263] >> [ 0.58175647 0.16293922 0.30488182 0.67367263] >> [ 0.59574921 0.61554857 0.06155764 0.75352295] >> > > Shouldn't numpy (and/or multiprocessing) be smart enough to prevent > this kind of error? A simple enough solution would be to also include > the process id as part of the seed since it appears that the problem > only occurs when you have different processes/threads accessing the > random number generator at the same time. > But the seed is set only once in the above code. So the problem has nothing to do with numpy. I don't think using the pid as a seed is a good idea either - for each task, it should be set to a true random source. David From sturla at molden.no Thu Dec 11 12:21:34 2008 From: sturla at molden.no (Sturla Molden) Date: Thu, 11 Dec 2008 18:21:34 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> Message-ID: <49414C1E.4020701@molden.no> On 12/11/2008 6:10 PM, Michael Gilbert wrote: > Shouldn't numpy (and/or multiprocessing) be smart enough to prevent > this kind of error? A simple enough solution would be to also include > the process id as part of the seed It would not help, as the seeding is done prior to forking. I am mostly familiar with Windows programming. But what is needed is a fork handler (similar to a system hook in Windows jargon) that sets a new seed in the child process. Could pthread_atfork be used? Sturla Molden From sturla at molden.no Thu Dec 11 12:36:11 2008 From: sturla at molden.no (Sturla Molden) Date: Thu, 11 Dec 2008 18:36:11 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49414C1E.4020701@molden.no> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> <49414C1E.4020701@molden.no> Message-ID: <49414F8B.6080104@molden.no> On 12/11/2008 6:21 PM, Sturla Molden wrote: > It would not help, as the seeding is done prior to forking. > > I am mostly familiar with Windows programming. But what is needed is a > fork handler (similar to a system hook in Windows jargon) that sets a > new seed in the child process. Actually I am not sure this should be done, as this issue technically speaking is not an error. A warning in the documentation would be better. Perhaps we should we should write a proper numpy + multiprocessing tutorial? Sturla Molden From david at ar.media.kyoto-u.ac.jp Thu Dec 11 12:29:55 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 12 Dec 2008 02:29:55 +0900 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49414C1E.4020701@molden.no> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> <49414C1E.4020701@molden.no> Message-ID: <49414E13.50103@ar.media.kyoto-u.ac.jp> Sturla Molden wrote: > On 12/11/2008 6:10 PM, Michael Gilbert wrote: > > >> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent >> this kind of error? A simple enough solution would be to also include >> the process id as part of the seed >> > > It would not help, as the seeding is done prior to forking. > > I am mostly familiar with Windows programming. But what is needed is a > fork handler (similar to a system hook in Windows jargon) that sets a > new seed in the child process. > > Could pthread_atfork be used? > The seed could be explicitly set in each task, no ? def task(x): np.random.seed() return np.random.random(x) But does this really make sense ? Is the goal to parallelize a big sampler into N tasks of M trials, to produce the same result as a sequential set of M*N trials ? Then it does sound like a trivial task at all. I know there exists libraries explicitly designed for parallel random number generation - maybe this is where we should look, instead of using heuristics which are likely to be bogus, and generate wrong results. cheers, David From gael.varoquaux at normalesup.org Thu Dec 11 12:49:03 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 11 Dec 2008 18:49:03 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49414E13.50103@ar.media.kyoto-u.ac.jp> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> <49414C1E.4020701@molden.no> <49414E13.50103@ar.media.kyoto-u.ac.jp> Message-ID: <20081211174903.GH1440@phare.normalesup.org> On Fri, Dec 12, 2008 at 02:29:55AM +0900, David Cournapeau wrote: > The seed could be explicitly set in each task, no ? > def task(x): > np.random.seed() > return np.random.random(x) Yes. The problem is trivial to solve, once you are aware of it. Just like the integer division problems we used to have back in the days where zeros, ones, ... returned ineger arrays. The point is that people will run into that problem and loose a lot of time. So we must make it so that they don't by mistake land in this situation, but purposely. One solution is to check the PID of the process when the PRNG is called, and reseed if it has changed. As pointed out, the danger of this is that this is magic, so there needs to be an option to turn this off. Ga?l From bsouthey at gmail.com Thu Dec 11 13:00:23 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 11 Dec 2008 12:00:23 -0600 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49414E13.50103@ar.media.kyoto-u.ac.jp> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> <49414C1E.4020701@molden.no> <49414E13.50103@ar.media.kyoto-u.ac.jp> Message-ID: <49415537.1020503@gmail.com> David Cournapeau wrote: > Sturla Molden wrote: > >> On 12/11/2008 6:10 PM, Michael Gilbert wrote: >> >> >> >>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent >>> this kind of error? A simple enough solution would be to also include >>> the process id as part of the seed >>> >>> >> It would not help, as the seeding is done prior to forking. >> >> I am mostly familiar with Windows programming. But what is needed is a >> fork handler (similar to a system hook in Windows jargon) that sets a >> new seed in the child process. >> >> Could pthread_atfork be used? >> >> > > The seed could be explicitly set in each task, no ? > > def task(x): > np.random.seed() > return np.random.random(x) > > But does this really make sense ? > > Is the goal to parallelize a big sampler into N tasks of M trials, to > produce the same result as a sequential set of M*N trials ? Then it does > sound like a trivial task at all. I know there exists libraries > explicitly designed for parallel random number generation - maybe this > is where we should look, instead of using heuristics which are likely to > be bogus, and generate wrong results. > > cheers, > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > This is not sufficient because you can not ensure that the seed will be different every time task() is called. A major part of the problem here is treating a parallel computing problem as a serial computing problem. The streams must be independent across threads especially avoiding cross-correlation of streams (another gotcha) between threads. It is up to the user to implement a thread-safe solution such as using a single stream that is used by all threads or force the different threads to start at different states. The only thing that Numpy could do is provide a parallel pseudo-random number generator. Bruce From sturla at molden.no Thu Dec 11 13:04:21 2008 From: sturla at molden.no (Sturla Molden) Date: Thu, 11 Dec 2008 19:04:21 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49414E13.50103@ar.media.kyoto-u.ac.jp> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> <49414C1E.4020701@molden.no> <49414E13.50103@ar.media.kyoto-u.ac.jp> Message-ID: <49415625.3000201@molden.no> On 12/11/2008 6:29 PM, David Cournapeau wrote: > def task(x): > np.random.seed() > return np.random.random(x) > > But does this really make sense ? Hard to say... There is a chance of this producing indentical or overlapping sequences, albeit unlikely. I would not do this. I'd make one process responsible for making the random numbers and write those to a queue. It would scale if generating the deviates is the least costly part of the algorithm. Sturla Molden === test.py === from test_helper import task, generator from multiprocessing import Pool, Process, Queue q = Queue(maxsize=32) # or whatever g = Process(args=(4,q)) # preferably a number much larger than 4!!! g.start() p = Pool(4) jobs = list() for i in range(4): jobs.append(p.apply_async(task, (q,))) print [j.get() for j in jobs] p.close() p.join() g.terminate() === test_helper.py === import numpy as np def generator(x, q): while 1: item = np.random.random(x) q.put(item) def task(q): return q.get() > Is the goal to parallelize a big sampler into N tasks of M trials, to > produce the same result as a sequential set of M*N trials ? Then it does > sound like a trivial task at all. I know there exists libraries > explicitly designed for parallel random number generation - maybe this > is where we should look, instead of using heuristics which are likely to > be bogus, and generate wrong results. > > cheers, > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From cournape at gmail.com Thu Dec 11 13:31:47 2008 From: cournape at gmail.com (David Cournapeau) Date: Fri, 12 Dec 2008 03:31:47 +0900 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49415537.1020503@gmail.com> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> <49414C1E.4020701@molden.no> <49414E13.50103@ar.media.kyoto-u.ac.jp> <49415537.1020503@gmail.com> Message-ID: <5b8d13220812111031n6de1241we7c8f1c90e1caf83@mail.gmail.com> On Fri, Dec 12, 2008 at 3:00 AM, Bruce Southey wrote: > David Cournapeau wrote: >> Sturla Molden wrote: >> >>> On 12/11/2008 6:10 PM, Michael Gilbert wrote: >>> >>> >>> >>>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent >>>> this kind of error? A simple enough solution would be to also include >>>> the process id as part of the seed >>>> >>>> >>> It would not help, as the seeding is done prior to forking. >>> >>> I am mostly familiar with Windows programming. But what is needed is a >>> fork handler (similar to a system hook in Windows jargon) that sets a >>> new seed in the child process. >>> >>> Could pthread_atfork be used? >>> >>> >> >> The seed could be explicitly set in each task, no ? >> >> def task(x): >> np.random.seed() >> return np.random.random(x) >> >> But does this really make sense ? >> >> Is the goal to parallelize a big sampler into N tasks of M trials, to >> produce the same result as a sequential set of M*N trials ? Then it does >> sound like a trivial task at all. I know there exists libraries >> explicitly designed for parallel random number generation - maybe this >> is where we should look, instead of using heuristics which are likely to >> be bogus, and generate wrong results. >> >> cheers, >> >> David >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> > This is not sufficient because you can not ensure that the seed will be > different every time task() is called. Yes, right. I was assuming that each seed call would result in a /dev/urandom read - but the problem is the same whether it is done in task or in a pthread_atfork method anyway. > The > only thing that Numpy could do is provide a parallel pseudo-random > number generator. Yes, exactly - hence my question whether this makes sense at all. Even having different, "truely" random seeds does not guarantee that the whole method makes sense - at least, I don't see why it should. In particular, if the process should give the same result independently of the number of parallels tasks, the problem becomes difficult. Intrigued by the problem, I briefly looked into the literature for parallel RNG; it certainly does not look like an easy task, and the chance of getting it right without knowing about the topic does not look high. cheers, David From cournape at gmail.com Thu Dec 11 13:34:01 2008 From: cournape at gmail.com (David Cournapeau) Date: Fri, 12 Dec 2008 03:34:01 +0900 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <20081211174903.GH1440@phare.normalesup.org> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> <49414C1E.4020701@molden.no> <49414E13.50103@ar.media.kyoto-u.ac.jp> <20081211174903.GH1440@phare.normalesup.org> Message-ID: <5b8d13220812111034g1edfe325n15b264c91213eed9@mail.gmail.com> On Fri, Dec 12, 2008 at 2:49 AM, Gael Varoquaux wrote: > On Fri, Dec 12, 2008 at 02:29:55AM +0900, David Cournapeau wrote: >> The seed could be explicitly set in each task, no ? > >> def task(x): >> np.random.seed() >> return np.random.random(x) > > Yes. The problem is trivial to solve, once you are aware of it. Just like > the integer division problems we used to have back in the days where > zeros, ones, ... returned ineger arrays. The point is that people will > run into that problem and loose a lot of time. So we must make it so that > they don't by mistake land in this situation, but purposely. > > One solution is to check the PID of the process when the PRNG is called, > and reseed if it has changed. As pointed out, the danger of this is that > this is magic, so there needs to be an option to turn this off. The biggest danger is that the whole method may not make sense at all, and lose all the properties of a good random number generator. I don't understand your comparison with integer division: this is not an API or expected behavior problem, but an algorithmic one. David From josef.pktd at gmail.com Thu Dec 11 13:39:32 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 11 Dec 2008 13:39:32 -0500 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49415625.3000201@molden.no> References: <20081211152049.GB1440@phare.normalesup.org> <5b8d13220812110757l14156dfmfd35a07656b540a8@mail.gmail.com> <49413E70.2090809@molden.no> <20081211163914.GD1440@phare.normalesup.org> <4941461E.5000907@molden.no> <8e2a98be0812110910n7508ffb3p75a026c92ef2ad6e@mail.gmail.com> <49414C1E.4020701@molden.no> <49414E13.50103@ar.media.kyoto-u.ac.jp> <49415625.3000201@molden.no> Message-ID: <1cd32cbb0812111039k3446a859lb5e9dd43f0e5c33@mail.gmail.com> > >> Is the goal to parallelize a big sampler into N tasks of M trials, to >> produce the same result as a sequential set of M*N trials ? Then it does >> sound like a trivial task at all. I know there exists libraries >> explicitly designed for parallel random number generation - maybe this >> is where we should look, instead of using heuristics which are likely to >> be bogus, and generate wrong results. >> Another heuristic using pseudo random seed for each process Generate random integers (large) in the main process, and send it as seeds to each task. This makes it replicable if the initial seed is set, and should have independent "pseudo" random numbers in each stream. This works in probability theory, but I don't know about the quality of RNGs. Josef From sturla at molden.no Thu Dec 11 14:16:40 2008 From: sturla at molden.no (Sturla Molden) Date: Thu, 11 Dec 2008 20:16:40 +0100 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <20081211152049.GB1440@phare.normalesup.org> References: <20081211152049.GB1440@phare.normalesup.org> Message-ID: <49416718.3070206@molden.no> I'd just like to add that yet another option would be to use the manager/proxy object in multiprocessing. In this case numpy.random.random will be called in the parent process. I have not used this and I am not sure how efficient it is. But the possibility is there. Sturla Molden === test.py === from test_helper import task, RandomManager from multiprocessing import Pool rm = RandomManager() rm.start() random = rm.Random() p = Pool(4) jobs = list() for i in range(4): jobs.append(p.apply_async(task, (4,random))) print [j.get() for j in jobs] p.close() p.join() rm.shutdown() === test_helper.py === import numpy as np import multiprocessing as mp from mp.managers import BaseManager, CreatorMethod class RandomClass(object): def random(self, x): return np.random.random(x) class RandomManager(BaseManager): Random = CreatorMethod(RandomClass) def task(x, random): return random.random(x) On 12/11/2008 4:20 PM, Gael Varoquaux wrote: > Hi there, > > I have been using the multiprocessing module a lot to do statistical tests > such as Monte Carlo or resampling, and I have just discovered something > that makes me wonder if I haven't been accumulating false results. Given > two files: > > === test.py === > from test_helper import task > from multiprocessing import Pool > > p = Pool(4) > > jobs = list() > for i in range(4): > jobs.append(p.apply_async(task, (4, ))) > > print [j.get() for j in jobs] > > p.close() > p.join() > > === test_helper.py === > import numpy as np > > def task(x): > return np.random.random(x) > > ======= > > If I run test.py, I get: > > [array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ > 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.35773964, > 0.63945684, 0.50855196, 0.08631373]), array([ 0.65357725, 0.35649382, > 0.02203999, 0.7591353 ])] > > In other words, the 4 processes give me the same exact results. > > Now I understand why this is the case: the different instances of the > random number generator where created by forking from the same process, > so they are exactly the very same object. This is howver a fairly bad > trap. I guess other people will fall into it. > > The take home message is: > **call 'numpy.random.seed()' when you are using multiprocessing** > > I wonder if we can find a way to make this more user friendly? Would be > easy, in the C code, to check if the PID has changed, and if so reseed > the random number generator? I can open up a ticket for this if people > think this is desirable (I think so). > > On a side note, there are a score of functions in numpy.random with > __module__ to None. It makes it inconvenient to use it with > multiprocessing (for instance it forced the creation of the 'test_helper' > file here). > > Ga?l > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Thu Dec 11 14:33:26 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 11 Dec 2008 14:33:26 -0500 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <49416718.3070206@molden.no> References: <20081211152049.GB1440@phare.normalesup.org> <49416718.3070206@molden.no> Message-ID: <1cd32cbb0812111133l4921d211t2f125a91c4a16328@mail.gmail.com> Here is the c program and the description how to implement independent Mersenne Twister PRNGs by the inventor(s) of Mersenne Twister: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html I didn't see a license statement. Josef From robert.kern at gmail.com Thu Dec 11 15:49:50 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 11 Dec 2008 12:49:50 -0800 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <20081211152049.GB1440@phare.normalesup.org> References: <20081211152049.GB1440@phare.normalesup.org> Message-ID: <3d375d730812111249xfcaf843q5647e3e3923cd0c9@mail.gmail.com> On Thu, Dec 11, 2008 at 07:20, Gael Varoquaux wrote: > The take home message is: > **call 'numpy.random.seed()' when you are using multiprocessing** Create RandomState objects and use those. This is a best practice whether you are using multiprocessing or not. The module-level functions really should only be used for noodling around in IPython. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sturla at molden.no Thu Dec 11 15:57:03 2008 From: sturla at molden.no (Sturla Molden) Date: Thu, 11 Dec 2008 21:57:03 +0100 (CET) Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <1cd32cbb0812111133l4921d211t2f125a91c4a16328@mail.gmail.com> References: <20081211152049.GB1440@phare.normalesup.org> <49416718.3070206@molden.no> <1cd32cbb0812111133l4921d211t2f125a91c4a16328@mail.gmail.com> Message-ID: <2c57766b7f128ee6e59538eb00c28b95.squirrel@webmail.uio.no> In the docs I found this: "We used a hypothesis that a set of PRNGs based on linear recurrences is mutually 'independent' if the characteristic polynomials are relatively prime to each other. There is no rigorous proof of this hypothesis..." S.M. > Here is the c program and the description how to implement independent > Mersenne Twister PRNGs by the inventor(s) of Mersenne Twister: > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html > > I didn't see a license statement. > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From sturla at molden.no Thu Dec 11 16:06:41 2008 From: sturla at molden.no (Sturla Molden) Date: Thu, 11 Dec 2008 22:06:41 +0100 (CET) Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <3d375d730812111249xfcaf843q5647e3e3923cd0c9@mail.gmail.com> References: <20081211152049.GB1440@phare.normalesup.org> <3d375d730812111249xfcaf843q5647e3e3923cd0c9@mail.gmail.com> Message-ID: > Create RandomState objects and use those. This is a best practice > whether you are using multiprocessing or not. The module-level > functions really should only be used for noodling around in IPython. Are we guaranteed that two RandomStates will produce two independent sequences? If not, RandomState cannot be used for this particular purpose. Cf. what the creators of MT wrote about dynamically creating MT generators at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html Sturla Molden From robert.kern at gmail.com Thu Dec 11 16:11:17 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 11 Dec 2008 13:11:17 -0800 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: References: <20081211152049.GB1440@phare.normalesup.org> <3d375d730812111249xfcaf843q5647e3e3923cd0c9@mail.gmail.com> Message-ID: <3d375d730812111311k11c50966lb0c9a6fd4a592797@mail.gmail.com> On Thu, Dec 11, 2008 at 13:06, Sturla Molden wrote: > >> Create RandomState objects and use those. This is a best practice >> whether you are using multiprocessing or not. The module-level >> functions really should only be used for noodling around in IPython. > > Are we guaranteed that two RandomStates will produce two independent > sequences? No. > If not, RandomState cannot be used for this particular purpose. For small numbers of processes and not-huge runs, I think it's reasonable. You can also implement skipping fairly straightforwardly. If you're in Python, the wasted time is probably a small part of the inefficiencies. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Thu Dec 11 22:05:56 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 11 Dec 2008 22:05:56 -0500 Subject: [Numpy-discussion] building numpy trunk on WindowsXP Message-ID: <1cd32cbb0812111905o7089cdfbgde6bbaad5e8b60f1@mail.gmail.com> I just tried to build numpy for the first time, on Windows XP SP2, sse2, single CPU with MingW 3.45, Python25 I used `setup.py bdist` and copied extracted archive into sitepackages, (so I can delete it again) Are the errors and failures below expected, or did my build not work correctly? Josef Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.test() Running unit tests for numpy NumPy version 1.3.0.dev6139 NumPy is installed in C:\Programs\Python25\lib\site-packages\numpy Python version 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Int el)] nose version 0.10.4 ................................................................................ ................................................................................ ................................................................................ .......FF....................................................................... ................................................................................ ...........................................K.................................... ...............................................................Ignoring "Python was built with Visual Studio 2003; extensions must be built with a compiler than can generate compatible binaries. Visual Studio 2003 was not found on this system. If you have Cygwin installed, you can try compiling with MingW32, by passing "-c mingw32" to setup.py." (one s hould fix me in fcompiler/compaq.py) ................................................................................ ................................................................................ ................................................................................ ................................................................................ ..................E..F.....E.................................................... ................................................................................ ................................................................................ ................................................................................ ................................................................................ ......S......................................................................... ................................................................................ ................................................................................ ................................................................................ ............................................ ====================================================================== ERROR: test_mmap (test_io.TestSaveLoad) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Programs\Python25\lib\site-packages\numpy\lib\tests\test_io.py", line 64, in test_mmap self.roundtrip(a, file_on_disk=True, load_kwds={'mmap_mode': 'r'}) File "C:\Programs\Python25\lib\site-packages\numpy\lib\tests\test_io.py", line 72, in roundtrip RoundtripTest.roundtrip(self, np.save, *args, **kwargs) File "C:\Programs\Python25\lib\site-packages\numpy\lib\tests\test_io.py", line 40, in roundtrip arr_reloaded = np.load(load_file, **load_kwds) File "C:\Programs\Python25\lib\site-packages\numpy\lib\io.py", line 137, in lo ad fid = _file(file,"rb") IOError: [Errno 13] Permission denied: 'c:\\docume~1\\carrasco\\locals~1\\temp\\ tmp2zygyo' ====================================================================== ERROR: test_mmap (test_io.TestSavezLoad) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Programs\Python25\lib\site-packages\numpy\lib\tests\test_io.py", line 64, in test_mmap self.roundtrip(a, file_on_disk=True, load_kwds={'mmap_mode': 'r'}) File "C:\Programs\Python25\lib\site-packages\numpy\lib\tests\test_io.py", line 77, in roundtrip RoundtripTest.roundtrip(self, np.savez, *args, **kwargs) File "C:\Programs\Python25\lib\site-packages\numpy\lib\tests\test_io.py", line 40, in roundtrip arr_reloaded = np.load(load_file, **load_kwds) File "C:\Programs\Python25\lib\site-packages\numpy\lib\io.py", line 137, in lo ad fid = _file(file,"rb") IOError: [Errno 13] Permission denied: 'c:\\docume~1\\carrasco\\locals~1\\temp\\ tmpedpthb' ====================================================================== FAIL: Check formatting. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Programs\Python25\lib\site-packages\numpy\core\tests\test_print.py", line 28, in test_complex_types assert_equal(str(t(x)), str(complex(x))) File "C:\Programs\Python25\lib\site-packages\numpy\testing\utils.py", line 183 , in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: '(0+5.9287877500949585e-323j)' DESIRED: '(1+0j)' ====================================================================== FAIL: Check formatting. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Programs\Python25\lib\site-packages\numpy\core\tests\test_print.py", line 16, in test_float_types assert_equal(str(t(x)), str(float(x))) File "C:\Programs\Python25\lib\site-packages\numpy\testing\utils.py", line 183 , in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: '0.0' DESIRED: '1.0' ====================================================================== FAIL: test_array (test_io.TestSaveTxt) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Programs\Python25\lib\site-packages\numpy\lib\tests\test_io.py", line 105, in test_array '3.000000000000000000e+00 4.000000000000000000e+00\n']) AssertionError ---------------------------------------------------------------------- Ran 1627 tests in 10.640s FAILED (KNOWNFAIL=1, SKIP=1, errors=2, failures=3) From david at ar.media.kyoto-u.ac.jp Thu Dec 11 23:26:06 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 12 Dec 2008 13:26:06 +0900 Subject: [Numpy-discussion] building numpy trunk on WindowsXP In-Reply-To: <1cd32cbb0812111905o7089cdfbgde6bbaad5e8b60f1@mail.gmail.com> References: <1cd32cbb0812111905o7089cdfbgde6bbaad5e8b60f1@mail.gmail.com> Message-ID: <4941E7DE.9070503@ar.media.kyoto-u.ac.jp> josef.pktd at gmail.com wrote: > I just tried to build numpy for the first time, on Windows XP SP2, > sse2, single CPU with MingW 3.45, Python25 > > I used `setup.py bdist` and copied extracted archive into > sitepackages, (so I can delete it again) > You can also use bdist_wininst, which will create a binary installer, with uninstall feature. > Are the errors and failures below expected, or did my build not work correctly? > I have not seen the io ones, but I have not tested numpy on windows recently, so they may be regressions or new tests which do not pass on windows. David From charlesr.harris at gmail.com Fri Dec 12 00:12:44 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 11 Dec 2008 22:12:44 -0700 Subject: [Numpy-discussion] building numpy trunk on WindowsXP In-Reply-To: <4941E7DE.9070503@ar.media.kyoto-u.ac.jp> References: <1cd32cbb0812111905o7089cdfbgde6bbaad5e8b60f1@mail.gmail.com> <4941E7DE.9070503@ar.media.kyoto-u.ac.jp> Message-ID: On Thu, Dec 11, 2008 at 9:26 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > josef.pktd at gmail.com wrote: > > I just tried to build numpy for the first time, on Windows XP SP2, > > sse2, single CPU with MingW 3.45, Python25 > > > > I used `setup.py bdist` and copied extracted archive into > > sitepackages, (so I can delete it again) > > > > You can also use bdist_wininst, which will create a binary installer, > with uninstall feature. > > > Are the errors and failures below expected, or did my build not work > correctly? > > > > I have not seen the io ones, but I have not tested numpy on windows > recently, so they may be regressions or new tests which do not pass on > windows. > I think the io errors used to show up on the windows buildbots, something to do with temp files and permissions on windows. Some of the formatting errors look like missing values, i.e., it's just stuff that happened to be in some memory location. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Dec 12 01:01:41 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 12 Dec 2008 01:01:41 -0500 Subject: [Numpy-discussion] building numpy trunk on WindowsXP In-Reply-To: References: <1cd32cbb0812111905o7089cdfbgde6bbaad5e8b60f1@mail.gmail.com> <4941E7DE.9070503@ar.media.kyoto-u.ac.jp> Message-ID: <1cd32cbb0812112201w78698ac8v28521d1a984321d9@mail.gmail.com> On Fri, Dec 12, 2008 at 12:12 AM, Charles R Harris wrote: > > > On Thu, Dec 11, 2008 at 9:26 PM, David Cournapeau > wrote: >> >> josef.pktd at gmail.com wrote: >> > I just tried to build numpy for the first time, on Windows XP SP2, >> > sse2, single CPU with MingW 3.45, Python25 >> > >> > I used `setup.py bdist` and copied extracted archive into >> > sitepackages, (so I can delete it again) >> > >> >> You can also use bdist_wininst, which will create a binary installer, >> with uninstall feature. >> >> > Are the errors and failures below expected, or did my build not work >> > correctly? >> > >> >> I have not seen the io ones, but I have not tested numpy on windows >> recently, so they may be regressions or new tests which do not pass on >> windows. > > I think the io errors used to show up on the windows buildbots, something to > do with temp files and permissions on windows. Some of the formatting errors > look like missing values, i.e., it's just stuff that happened to be in some > memory location. > > Chuck > > Thanks, I didn't expect that building numpy would work this easily. Josef From charlesr.harris at gmail.com Fri Dec 12 02:11:24 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 12 Dec 2008 00:11:24 -0700 Subject: [Numpy-discussion] building numpy trunk on WindowsXP In-Reply-To: <1cd32cbb0812112201w78698ac8v28521d1a984321d9@mail.gmail.com> References: <1cd32cbb0812111905o7089cdfbgde6bbaad5e8b60f1@mail.gmail.com> <4941E7DE.9070503@ar.media.kyoto-u.ac.jp> <1cd32cbb0812112201w78698ac8v28521d1a984321d9@mail.gmail.com> Message-ID: On Thu, Dec 11, 2008 at 11:01 PM, wrote: > On Fri, Dec 12, 2008 at 12:12 AM, Charles R Harris > wrote: > > > > > > On Thu, Dec 11, 2008 at 9:26 PM, David Cournapeau > > wrote: > >> > >> josef.pktd at gmail.com wrote: > >> > I just tried to build numpy for the first time, on Windows XP SP2, > >> > sse2, single CPU with MingW 3.45, Python25 > >> > > >> > I used `setup.py bdist` and copied extracted archive into > >> > sitepackages, (so I can delete it again) > >> > > >> > >> You can also use bdist_wininst, which will create a binary installer, > >> with uninstall feature. > >> > >> > Are the errors and failures below expected, or did my build not work > >> > correctly? > >> > > >> > >> I have not seen the io ones, but I have not tested numpy on windows > >> recently, so they may be regressions or new tests which do not pass on > >> windows. > > > > I think the io errors used to show up on the windows buildbots, something > to > > do with temp files and permissions on windows. Some of the formatting > errors > > look like missing values, i.e., it's just stuff that happened to be in > some > > memory location. > > > > Chuck > > > > > > > Thanks, I didn't expect that building numpy would work this easily. > Now that the windows buildbot is back after being offline for a while, I see the same io errors. I wonder if something changed? The formatting errors aren't there, however. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwg at emss.co.za Fri Dec 12 02:28:30 2008 From: gwg at emss.co.za (George Goussard) Date: Fri, 12 Dec 2008 09:28:30 +0200 Subject: [Numpy-discussion] Singular Matrix problem with Matplitlib in Numpy (Windows - AMD64) In-Reply-To: References: Message-ID: <15B34CD0955E484689D667626E6456D5011CA54868@london.emss.co.za> Hello David. I am using the Intel MKL BLAS/LAPACK. I have replaced this with AMD's ACML library. Now there is no exception raised due to a "Singular matrix" while trying to move the legend(wiggling the graph). So, the graph is updated and the interaction is fine(you can wiggle the graph and it updates, minimize, maximeie etc.). But ... the legend is now only drawn sometimes and the graphs are drawn with an intermittent line, as if the - - - pattern was specified. Something is still not right. I just can't seem to put my finger on it since there are some many parties involved(numpy,matplotlib,python, ctypes etc.) I also ran the numpy.test() with NUmpy that I compiled with AMD's ACML. The results are included: Running unit tests for numpy NumPy version 1.2.1 Results of numpy.test() NumPy is installed in C:\Development\Python\2_5_2\lib\site-packages\numpy Python version 2.5.2 (r252:60911, Dec 12 2008, 08:38:07) [MSC v.1400 64 bit (AMD64)] nose version 0.10.4 Forcing DISTUTILS_USE_SDK=1 ............................................................................................................................................ .........................................................................................................K..K............................... ............................................................................................................................................ .......................K.................................................................................Ignoring "MSVCCompiler instance has no attribute '_MSVCCompiler__root'" (I think it is msvccompiler.py bug) ...........................S................................................................................................................ ............................................................................................................................................ ............................................................................................................................................ ............................................................................................................................................ ............................................................................................................................................ ............................................................................................................................................ ............................................................................................................................................ ....................................................................................... ---------------------------------------------------------------------- Ran 1592 tests in 10.704s OK (KNOWNFAIL=3, SKIP=1) Thanks. George. Message: 3 Date: Tue, 9 Dec 2008 02:43:25 +0900 From: "David Cournapeau" Subject: Re: [Numpy-discussion] Singular Matrix problem with Matplitlib in Numpy (Windows - AMD64) To: "Discussion of Numerical Python" Message-ID: <5b8d13220812080943g69d4c670jabd6aef66d336e29 at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 On Tue, Dec 9, 2008 at 12:50 AM, George Goussard wrote: > Hello. > > > > I have been battling with the following error for the past week. The output > from the terminal is: > What does numpy.test() says ? Did you use an external blas/lapack when you built numpy for AMD64 David ------------------------------ _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion End of Numpy-discussion Digest, Vol 27, Issue 22 ************************************************ From gael.varoquaux at normalesup.org Fri Dec 12 08:20:50 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 12 Dec 2008 14:20:50 +0100 Subject: [Numpy-discussion] Plot directive in numpy docs Message-ID: <20081212132050.GA7822@phare.normalesup.org> Hi, What is the guideline on using the plot directive in the numpy docs? It can make some examples much easier to understand, but on the other hand, it can clutter the docstrings. In addition, I am not sure how our documentation pipeline deals with it. Cheers, Ga?l From pav at iki.fi Fri Dec 12 09:09:14 2008 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 12 Dec 2008 14:09:14 +0000 (UTC) Subject: [Numpy-discussion] Plot directive in numpy docs References: <20081212132050.GA7822@phare.normalesup.org> Message-ID: Fri, 12 Dec 2008 14:20:50 +0100, Gael Varoquaux wrote: > What is the guideline on using the plot directive in the numpy docs? It > can make some examples much easier to understand, but on the other hand, > it can clutter the docstrings. In addition, I am not sure how our > documentation pipeline deals with it. No guideline yet, I'd suggest not to use it in docstrings yet, before we are sure it works as we want it to work. It does not (and probably will not, due to security reasons) work in the wiki. But it already works in the built documentation. ** How it works now .. plot:: import matplotlib.pyplot as plt x = np.linspace(0, 2*pi, 200) plt.plot(x, np.sin(x)) plt.show() assuming "import numpy as np" is pre-defined in a Sphinx conf.py directive. The code can either be in doctest format or not, this is automatically detected. Each matplotlib figure gets a separate image, for examples see the Scipy Tutorial. The code is executed and images are automatically captured into files when the documentation is built. (No need to use eg. savefig.) The code inside the plot:: directive cannot, due to technical reasons, access any variables etc. used in preceding doctests. ** What to think about - Should docstrings be assumed by default to lie inside plot:: directive, unless a plot:: directive is explicitly used? This way the plot could use stuff defined in earlier examples. - Or, maybe only the examples section should assumed to be in a plot:: by default, if it contains doctests. I'd think this would be a good idea. This way, we could continue using the current docstring standard, and be able to generate figures from the examples that use matplotlib. - I don't think the plot:: directive itself adds much line noise to the docstrings, but maybe some differ. - Where to allow plots in docstrings. I'd guess only in the examples section. - Whether to use doctest notation inside plot:: or not. - Our security model should continue to be as previously: the person who checks in docstrings from the wiki to SVN checks that the doctests do not contain malicious code. - We have technical leeway to do many things, because Sphinx allows us to preprocess docstrings in any way we want before processing them. Also: There was the unresolved question about should the example codes be run when numpy.test() is run, and what to do with matplotlib code in this case. The main problem was that if the plot codes are picked up as doctests, then the matplotlib objects returned by pyplot functions cause unnecessary line noise. Definitely, any doctest markup should be avoided in the examples. So the options were either to implement some magic to skip offending doctest lines, or to not use doctest markup for plots. -- Pauli Virtanen From rw247 at astro.columbia.edu Fri Dec 12 12:47:04 2008 From: rw247 at astro.columbia.edu (Ross Williamson) Date: Fri, 12 Dec 2008 12:47:04 -0500 Subject: [Numpy-discussion] Save a class Message-ID: <6D1EC63E-700D-427C-9A56-C714C819787C@astro.columbia.edu> Dear all I have a class that contains various data arrays and constants Is there a way of using numpy.save() to save the class so that when I reload it back in I have access to all the member arrays? Thanks Ross From josef.pktd at gmail.com Fri Dec 12 12:48:24 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 12 Dec 2008 12:48:24 -0500 Subject: [Numpy-discussion] bugfix for np.random logseries and hypergeometric are verified Message-ID: <1cd32cbb0812120948g6b177563p9ddb89bf04c2600@mail.gmail.com> Hi, Now that I managed to compile numpy, I tried out the bugfixes in ticket:921 and ticket:923. For both, checking the results gives now correct results. I attached the scripts and results to the tickets. Both distributions will be tested again in scipy.stats, once I remove the skip and add the arguments that previously resulted in wrong numbers. Can someone committ these, or can I commit them myself? (This will be the maximum that I touch C, changing one word and one inequality) Josef From sturla at molden.no Fri Dec 12 13:51:56 2008 From: sturla at molden.no (Sturla Molden) Date: Fri, 12 Dec 2008 19:51:56 +0100 (CET) Subject: [Numpy-discussion] Save a class In-Reply-To: <6D1EC63E-700D-427C-9A56-C714C819787C@astro.columbia.edu> References: <6D1EC63E-700D-427C-9A56-C714C819787C@astro.columbia.edu> Message-ID: See the module docs for pickle and cPickle. Sturla Molden > Dear all > > I have a class that contains various data arrays and constants > > Is there a way of using numpy.save() to save the class so that when I > reload it back in I have access to all the member arrays? > > Thanks > > Ross > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Fri Dec 12 14:47:23 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 12 Dec 2008 12:47:23 -0700 Subject: [Numpy-discussion] numpy.random and multiprocessing In-Reply-To: <20081211152049.GB1440@phare.normalesup.org> References: <20081211152049.GB1440@phare.normalesup.org> Message-ID: On Thu, Dec 11, 2008 at 8:20 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > Hi there, > > I have been using the multiprocessing module a lot to do statistical tests > such as Monte Carlo or resampling, and I have just discovered something > that makes me wonder if I haven't been accumulating false results. Given > two files: > You might also want to contact Bruce Carneal bcarneal at gmail.com, as he did some work on this. He is interested in clustering/multiprocessing simulations and is currently working on a clustering package. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpyle at post.harvard.edu Fri Dec 12 12:27:28 2008 From: rpyle at post.harvard.edu (Robert Pyle) Date: Fri, 12 Dec 2008 12:27:28 -0500 Subject: [Numpy-discussion] ANNOUNCE: ETS 3.1.0 released! In-Reply-To: <493FFC4D.60500@enthought.com> References: <493FFC4D.60500@enthought.com> Message-ID: <37679F51-F390-49C7-AAA6-DB14AC12DEF4@post.harvard.edu> Hi, I'm on a Mac G5, with EPD as my python: ------ ~ $ python EPD Py25 (4.1.30001_beta1) -- http://www.enthought.com/epd Python 2.5.2 |EPD Py25 4.1.30001_beta1| (r252:60911, Nov 23 2008, 15:11:42) [GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> ------ I tried 'easy_install -U ETS==3.1.0' and ended up with: ------ The package setup script has attempted to modify files on your system that are not within the EasyInstall build area, and has been aborted. This package cannot be safely installed by EasyInstall, and may not support alternate installation locations even if you run its setup script by hand. Please inform the package's author and the EasyInstall maintainers to find out if a fix or workaround is available. ------ Is my easy_install broken or out-of-date, or is there some small thing wrong in ETS 3.1.0? Thanks for making ETS and EPD available! Bob On Dec 10, 2008, at 12:28 PM, Dave Peterson wrote: > I'm pleased to announce that the Enthought Tool Suite (ETS) 3.1.0 has > been tagged, released, and uploaded to PyPi[1]! > > Both source distributions (.tar.gz) and binary (.egg) for Windows have > been built and uploaded to PyPi. > > You can update an existing ETS install to v3.1.0 like so: > easy_install -U ETS==3.1.0 From charlesr.harris at gmail.com Fri Dec 12 21:39:27 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 12 Dec 2008 19:39:27 -0700 Subject: [Numpy-discussion] bugfix for np.random logseries and hypergeometric are verified In-Reply-To: <1cd32cbb0812120948g6b177563p9ddb89bf04c2600@mail.gmail.com> References: <1cd32cbb0812120948g6b177563p9ddb89bf04c2600@mail.gmail.com> Message-ID: On Fri, Dec 12, 2008 at 10:48 AM, wrote: > Hi, > > Now that I managed to compile numpy, I tried out the bugfixes in > ticket:921 and ticket:923. > > For both, checking the results gives now correct results. I attached > the scripts and results to the tickets. > > Both distributions will be tested again in scipy.stats, once I remove > the skip and add the arguments that previously resulted in wrong > numbers. > > Can someone committ these, or can I commit them myself? (This will be > the maximum that I touch C, changing one word and one inequality) > I'll take a look at them tomorrow. I don't know if you can commit to numpy with your scipy permissions, add your name to TEST_COMMIT in the numpy top directory and give it a shot. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Sat Dec 13 07:14:23 2008 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Sat, 13 Dec 2008 14:14:23 +0200 Subject: [Numpy-discussion] Plot directive in numpy docs In-Reply-To: References: <20081212132050.GA7822@phare.normalesup.org> Message-ID: <6a17e9ee0812130414v4cb3c0dfw642d163976c1cbbe@mail.gmail.com> > 2008/12/12 Pauli Virtanen : > Fri, 12 Dec 2008 14:20:50 +0100, Gael Varoquaux wrote: >> What is the guideline on using the plot directive in the numpy docs? > > No guideline yet, I'd suggest not to use it in docstrings yet, before we > are sure it works as we want it to work. > > ** What to think about > > - Should docstrings be assumed by default to lie inside plot:: directive, > unless a plot:: directive is explicitly used? > > This way the plot could use stuff defined in earlier examples. > > - Or, maybe only the examples section should assumed to be in a plot:: > by default, if it contains doctests. > I'd prefer this approach, with the examples section assumed to be wrapped in a plot directive, rather than having the markup in the docstring itself. I'm not clear on what you mean by earlier examples. Do you mean earlier examples in the same docstring, or earlier examples in other docstrings? It makes most sense to me if each docstring has self contained examples and doesn't rely on anything defined elsewhere (except the obvious 'import numpy as np'). > Also: > > There was the unresolved question about should the example codes be run > when numpy.test() is run, and what to do with matplotlib code in this > case. The main problem was that if the plot codes are picked up as > doctests, then the matplotlib objects returned by pyplot functions cause > unnecessary line noise. Definitely, any doctest markup should be avoided > in the examples. So the options were either to implement some magic to > skip offending doctest lines, or to not use doctest markup for plots. However this is resolved, I wouldn't like to see any doctest markup in the finished documentation, whether this is viewed in the terminal, as html or as a pdf. Cheers, Scott From gael.varoquaux at normalesup.org Sat Dec 13 08:10:24 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 13 Dec 2008 14:10:24 +0100 Subject: [Numpy-discussion] ANNOUNCE: ETS 3.1.0 released! In-Reply-To: <37679F51-F390-49C7-AAA6-DB14AC12DEF4@post.harvard.edu> References: <493FFC4D.60500@enthought.com> <37679F51-F390-49C7-AAA6-DB14AC12DEF4@post.harvard.edu> Message-ID: <20081213131024.GA19210@phare.normalesup.org> On Fri, Dec 12, 2008 at 12:27:28PM -0500, Robert Pyle wrote: > ------ > I tried 'easy_install -U ETS==3.1.0' > and ended up with: > ------ > The package setup script has attempted to modify files on your system > that are not within the EasyInstall build area, and has been aborted. > This package cannot be safely installed by EasyInstall, and may not > support alternate installation locations even if you run its setup > script by hand. Please inform the package's author and the EasyInstall > maintainers to find out if a fix or workaround is available. > ------ AFAIK this is a bug/conflict in numpy.distutils or setuptools (hard to say which one is wrong, but they do raise this problem). I think the issue goes away if you rerun the easy_install command several times. Each times it seems to go further. Ga?l From rpyle at post.harvard.edu Sat Dec 13 16:01:41 2008 From: rpyle at post.harvard.edu (Robert Pyle) Date: Sat, 13 Dec 2008 16:01:41 -0500 Subject: [Numpy-discussion] ANNOUNCE: ETS 3.1.0 released! In-Reply-To: <20081213131024.GA19210@phare.normalesup.org> References: <493FFC4D.60500@enthought.com> <37679F51-F390-49C7-AAA6-DB14AC12DEF4@post.harvard.edu> <20081213131024.GA19210@phare.normalesup.org> Message-ID: Ga?l, Rerunning easy_install a couple of times did the trick. Thanks for the help. Bob On Dec 13, 2008, at 8:10 AM, Gael Varoquaux wrote: > On Fri, Dec 12, 2008 at 12:27:28PM -0500, Robert Pyle wrote: >> ------ > >> I tried 'easy_install -U ETS==3.1.0' > >> and ended up with: >> ------ >> The package setup script has attempted to modify files on your system >> that are not within the EasyInstall build area, and has been aborted. > >> This package cannot be safely installed by EasyInstall, and may not >> support alternate installation locations even if you run its setup >> script by hand. Please inform the package's author and the >> EasyInstall >> maintainers to find out if a fix or workaround is available. >> ------ > > AFAIK this is a bug/conflict in numpy.distutils or setuptools (hard to > say which one is wrong, but they do raise this problem). > > I think the issue goes away if you rerun the easy_install command > several > times. Each times it seems to go further. > > Ga?l > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From gael.varoquaux at normalesup.org Sat Dec 13 17:04:02 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 13 Dec 2008 23:04:02 +0100 Subject: [Numpy-discussion] ANNOUNCE: ETS 3.1.0 released! In-Reply-To: References: <493FFC4D.60500@enthought.com> <37679F51-F390-49C7-AAA6-DB14AC12DEF4@post.harvard.edu> <20081213131024.GA19210@phare.normalesup.org> Message-ID: <20081213220402.GL19210@phare.normalesup.org> On Sat, Dec 13, 2008 at 04:01:41PM -0500, Robert Pyle wrote: > Rerunning easy_install a couple of times did the trick. I just whish such workaround weren't necessary. Packaging is hard. Ga?l From Klaus.Noekel at gmx.de Sun Dec 14 13:45:40 2008 From: Klaus.Noekel at gmx.de (Klaus Noekel) Date: Sun, 14 Dec 2008 19:45:40 +0100 Subject: [Numpy-discussion] Win64 build? Message-ID: <49455454.1070303@gmx.de> Dear all, I would like to use numpy under Windows Vista 64-bit, but I am scared a bit by compiling a 64bit build myself. Is there an installer for Win64 somewhere, or are there any plans for one? Thanks for any advice, including directions for building myself (sigh), if that should be the only way. Klaus Noekel From arokem at berkeley.edu Sun Dec 14 15:26:42 2008 From: arokem at berkeley.edu (Ariel Rokem) Date: Sun, 14 Dec 2008 12:26:42 -0800 Subject: [Numpy-discussion] ANNOUNCE: ETS 3.1.0 released! In-Reply-To: <20081213220402.GL19210@phare.normalesup.org> References: <493FFC4D.60500@enthought.com> <37679F51-F390-49C7-AAA6-DB14AC12DEF4@post.harvard.edu> <20081213131024.GA19210@phare.normalesup.org> <20081213220402.GL19210@phare.normalesup.org> Message-ID: <43958ee60812141226x7d9db88bt886fc417bdccc933@mail.gmail.com> Hello - I tried doing this and keep getting the error attached below. It seems like it is a completely different issue, having to do with the architecture of my mac, but maybe someone here knows how to resolve it. I am doing all this on an Intel mac with OS 10.5.5 Thanks a bunch -- Ariel -------------------------------------------------------------------------------------------------------------------------------- ASR:~ arokem$ easy_install -U ETS==3.1.0 Searching for ETS==3.1.0 Reading http://pypi.python.org/simple/ETS/ Reading http://code.enthought.com/projects/tool-suite.php Best match: ETS 3.1.0 Downloading http://pypi.python.org/packages/2.5/E/ETS/ETS-3.1.0-py2.5.egg#md5=e0cde23026f5f0538dda271a6e08a175 Processing ETS-3.1.0-py2.5.egg Removing /Library/Frameworks/Python.framework/Versions/4.0.30001/lib/python2.5/site-packages/ETS-3.1.0-py2.5.egg Moving ETS-3.1.0-py2.5.egg to /Library/Frameworks/Python.framework/Versions/4.0.30001/lib/python2.5/site-packages ETS 3.1.0 is already the active version in easy-install.pth Installed /Library/Frameworks/Python.framework/Versions/4.0.30001/lib/python2.5/site-packages/ETS-3.1.0-py2.5.egg Reading http://code.enthought.com/enstaller/eggs/source Processing dependencies for ETS==3.1.0 Searching for Enable>=3.0.2.dev,<=3.0.2 Reading http://pypi.python.org/simple/Enable/ Reading http://code.enthought.com/projects/enable Best match: Enable 3.0.2 Downloading http://pypi.python.org/packages/source/E/Enable/Enable-3.0.2.tar.gz#md5=6c9ef42edd442ba8ef56f2371031fca7 Processing Enable-3.0.2.tar.gz Running Enable-3.0.2/setup.py -q bdist_egg --dist-dir /var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/egg-dist-tmp-NK8Tgm Found executable /Library/Frameworks/Python.framework/Versions/Current/bin/wx-config non-existing path in '/private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/enthought/kiva/mac': '/Library/Frameworks/Python.framework/Versions/4.0.30001/lib/python2.5/site-packages/wxPython-2.8.7.1.0001_s-py2.5-macosx-10.3-fat.egg/wx/wx/include/mac-unicode-release-2.8' zip_safe flag not set; analyzing archive contents... setupdocs.setupdocs: module references __file__ Installed /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/setupdocs-1.0.1-py2.5.egg Sphinx v0.4.2, building html trying to load pickled env... not found building [html]: targets for 1 source files that are out of date updating environment: 1 added, 0 changed, 0 removed reading... enable_concepts WARNING: GLOBAL:: master file /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/docs/source/index.rst not found pickling the env... done checking consistency... WARNING: /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/docs/source/enable_concepts.rst:: document isn't included in any toctree writing output... enable_concepts index [Errno 2] No such file or directory: '/private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/build/docs/html/.doctrees/index.doctree' /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/enthought/kiva/agg/src/gl/plat_support.i:95: Warning(121): %name is deprecated. Use %rename instead. /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/enthought/kiva/agg/freetype2/src/base/ftmac.c: In function 'FT_GetFile_From_Mac_Name': /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/enthought/kiva/agg/freetype2/src/base/ftmac.c:770: warning: initialization makes integer from pointer without a cast /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/enthought/kiva/agg/freetype2/src/base/ftmac.c:771: warning: initialization makes integer from pointer without a cast /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/enthought/kiva/agg/freetype2/src/base/ftmac.c: In function 'FT_GetFile_From_Mac_Name': /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/enthought/kiva/agg/freetype2/src/base/ftmac.c:770: warning: initialization makes integer from pointer without a cast /private/var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-/easy_install-1_oaOx/Enable-3.0.2/enthought/kiva/agg/freetype2/src/base/ftmac.c:771: warning: initialization makes integer from pointer without a cast ld: in /Developer/SDKs/MacOSX10.4u.sdk/usr/local/lib/libPng.dylib, file is not of required architecture for architecture ppc collect2: ld returned 1 exit status lipo: can't open input file: /var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-//ccdSAL3V.out (No such file or directory) ld: in /Developer/SDKs/MacOSX10.4u.sdk/usr/local/lib/libPng.dylib, file is not of required architecture for architecture ppc collect2: ld returned 1 exit status lipo: can't open input file: /var/folders/Qb/QbU9SmFNHoWnC7v-nTJYrE+++TI/-Tmp-//ccdSAL3V.out (No such file or directory) error: Setup script exited with error: Command "g++ -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle -undefined dynamic_lookup build/temp.macosx-10.3-i386-2.5/build/src.macosx-10.3-i386-2.5/enthought/kiva/agg/agg_wrap.o -Lbuild/temp.macosx-10.3-i386-2.5 -lkiva_src -lagg24_src -lfreetype2_src -o build/lib.macosx-10.3-i386-2.5/enthought/kiva/agg/_agg.so -framework Carbon -framework ApplicationServices -framework OpenGL -framework Carbon" failed with exit status 1 On Sat, Dec 13, 2008 at 2:04 PM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Sat, Dec 13, 2008 at 04:01:41PM -0500, Robert Pyle wrote: > > Rerunning easy_install a couple of times did the trick. > > I just whish such workaround weren't necessary. Packaging is hard. > > Ga?l > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Sun Dec 14 21:42:10 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 15 Dec 2008 11:42:10 +0900 Subject: [Numpy-discussion] Win64 build? In-Reply-To: <49455454.1070303@gmx.de> References: <49455454.1070303@gmx.de> Message-ID: <4945C402.4080209@ar.media.kyoto-u.ac.jp> Klaus Noekel wrote: > Dear all, > > I would like to use numpy under Windows Vista 64-bit, but I am scared a > bit by compiling a 64bit build myself. Is there an installer for Win64 > somewhere, or are there any plans for one? > No. > Thanks for any advice, including directions for building myself (sigh), > if that should be the only way. > Do you only need numpy or also scipy ? If you only need numpy, it is relatively straightforward because you don't need BLAS/LAPACK nor any fortran compiler. You should use the Visual Studio compiler, though: VS 2005 for python 2.5 or VS 2008 for python 2.6 - mingw does not work well yet for 64 bits. Of course, you can also install numpy 32 bits, which should work perfectly on windows 64 bits, cheers, David From lists at cheimes.de Sun Dec 14 23:13:56 2008 From: lists at cheimes.de (Christian Heimes) Date: Mon, 15 Dec 2008 05:13:56 +0100 Subject: [Numpy-discussion] Win64 build? In-Reply-To: <4945C402.4080209@ar.media.kyoto-u.ac.jp> References: <49455454.1070303@gmx.de> <4945C402.4080209@ar.media.kyoto-u.ac.jp> Message-ID: David Cournapeau schrieb: > Do you only need numpy or also scipy ? If you only need numpy, it is > relatively straightforward because you don't need BLAS/LAPACK nor any > fortran compiler. You should use the Visual Studio compiler, though: VS > 2005 for python 2.5 or VS 2008 for python 2.6 - mingw does not work well > yet for 64 bits. The offical Windows builds of Python 2.5 are created with Visual C 7.1 (also known as VS2003). You can compile an extension with VS 2005 but that will cause trouble. Christian From david at ar.media.kyoto-u.ac.jp Sun Dec 14 23:26:03 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 15 Dec 2008 13:26:03 +0900 Subject: [Numpy-discussion] Win64 build? In-Reply-To: References: <49455454.1070303@gmx.de> <4945C402.4080209@ar.media.kyoto-u.ac.jp> Message-ID: <4945DC5B.8040904@ar.media.kyoto-u.ac.jp> Christian Heimes wrote: > David Cournapeau schrieb: > >> Do you only need numpy or also scipy ? If you only need numpy, it is >> relatively straightforward because you don't need BLAS/LAPACK nor any >> fortran compiler. You should use the Visual Studio compiler, though: VS >> 2005 for python 2.5 or VS 2008 for python 2.6 - mingw does not work well >> yet for 64 bits. >> > > The offical Windows builds of Python 2.5 are created with Visual C 7.1 > (also known as VS2003). You can compile an extension with VS 2005 but > that will cause trouble. > Hm, I may have got confused between the IDE and the compiler version. VS2003 cannot build 64 bits binaries, right ? So you need the Platform/Windows SDK - which corresponds to the compiler version 14 (VS 2005) and not 13 (VS 2003), right ? cheers, David From klemm at phys.ethz.ch Mon Dec 15 13:27:51 2008 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Mon, 15 Dec 2008 19:27:51 +0100 Subject: [Numpy-discussion] Efficient removal of duplicates Message-ID: Hi, I the following problem: I have a relatively long array of points [(x0,y0), (x1,y1), ...]. Apparently, I have some duplicate entries, which prevents the Delaunay triangulation algorithm from completing its task. Question, is there an efficent way, of getting rid of the duplicate entries? All I can think of involves loops. Thanks and regards, Hanno -- Hanno Klemm klemm at phys.ethz.ch From bhaynor at hotmail.com Mon Dec 15 14:39:41 2008 From: bhaynor at hotmail.com (Benjamin Haynor) Date: Mon, 15 Dec 2008 11:39:41 -0800 Subject: [Numpy-discussion] Concatenating Arrays to make Views Message-ID: Hi, I was wondering if I can concatenate 3 arrays, where the result will be a view of the original three arrays, instead of a copy of the data. For example, suppose I write the following import numpy as n a = n.array([[1,2],[3,4]]) b = n.array([[5,6],[7,8]]) c = n.array([[9,10],[11,12]]) c = n.r_[a,b] Now c = : [[1,2], [3,4], [5,6], [7,8], [9,10], [11,12]] I was hoping to get an array, such that, when I change d, a, b, and c will also change appropriately. Any ideas? - Ben bhaynor at hotmail.com _________________________________________________________________ Send e-mail anywhere. No map, no compass. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_anywhere_122008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Mon Dec 15 14:47:06 2008 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 15 Dec 2008 14:47:06 -0500 Subject: [Numpy-discussion] Efficient removal of duplicates In-Reply-To: References: Message-ID: <4946B43A.9000709@american.edu> Hanno Klemm wrote: > I the following problem: I have a relatively long array of points > [(x0,y0), (x1,y1), ...]. Apparently, I have some duplicate entries, which > prevents the Delaunay triangulation algorithm from completing its task. > > Question, is there an efficent way, of getting rid of the duplicate > entries? `set`? Alan Isaac PS I think a couple such inquiries have perhaps suggested that it would be nice if `unique` took an axis argument. From strawman at astraw.com Mon Dec 15 14:57:15 2008 From: strawman at astraw.com (Andrew Straw) Date: Mon, 15 Dec 2008 11:57:15 -0800 Subject: [Numpy-discussion] Efficient removal of duplicates In-Reply-To: References: Message-ID: <4946B69B.7000709@astraw.com> Hanno Klemm wrote: > Hi, > > I the following problem: I have a relatively long array of points > [(x0,y0), (x1,y1), ...]. Apparently, I have some duplicate entries, which > prevents the Delaunay triangulation algorithm from completing its task. > > Question, is there an efficent way, of getting rid of the duplicate > entries? > All I can think of involves loops. > > Thanks and regards, > Hanno > > One idea is to create a view of the original array with a shape of (N,) and elements with a dtype that encompases both xn, yn. Then use numpy.unique() to find the unique entries, and create a view of that array with your original dtype. -Andrew From robert.kern at gmail.com Mon Dec 15 14:58:20 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 15 Dec 2008 11:58:20 -0800 Subject: [Numpy-discussion] Efficient removal of duplicates In-Reply-To: References: Message-ID: <3d375d730812151158t58b7949fr5fa970c0e6e479a5@mail.gmail.com> On Mon, Dec 15, 2008 at 10:27, Hanno Klemm wrote: > > Hi, > > I the following problem: I have a relatively long array of points > [(x0,y0), (x1,y1), ...]. Apparently, I have some duplicate entries, which > prevents the Delaunay triangulation algorithm from completing its task. > > Question, is there an efficent way, of getting rid of the duplicate > entries? > All I can think of involves loops. Besides transforming to tuples and using sets as Alan suggests, you can also cast your [N,2] array to a [N] structured array and use unique1d(). If you are doing interpolation and need to deal with the associated Z values, use the return_indices=True argument to get the indices of the unique values, too. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Mon Dec 15 14:59:52 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 15 Dec 2008 11:59:52 -0800 Subject: [Numpy-discussion] Concatenating Arrays to make Views In-Reply-To: References: Message-ID: <3d375d730812151159v32d22d86nb5fc91d9c2d302bf@mail.gmail.com> On Mon, Dec 15, 2008 at 11:39, Benjamin Haynor wrote: > Hi, > > I was wondering if I can concatenate 3 arrays, where the result will be a > view of the original three arrays, instead of a copy of the data. No, this is not possible in general with numpy's memory model. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From aarchiba at physics.mcgill.ca Mon Dec 15 15:24:24 2008 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Mon, 15 Dec 2008 15:24:24 -0500 Subject: [Numpy-discussion] Concatenating Arrays to make Views In-Reply-To: References: Message-ID: 2008/12/15 Benjamin Haynor : > I was wondering if I can concatenate 3 arrays, where the result will be a > view of the original three arrays, instead of a copy of the data. For > example, suppose I write the following > import numpy as n > a = n.array([[1,2],[3,4]]) > b = n.array([[5,6],[7,8]]) > c = n.array([[9,10],[11,12]]) > c = n.r_[a,b] > Now c = : > [[1,2], > [3,4], > [5,6], > [7,8], > [9,10], > [11,12]] > I was hoping to get an array, such that, when I change d, a, b, and c will > also change appropriately. > Any ideas? An array must be a contiguous piece of memory, so this is impossible unless you allocate d first and make a b and c views of it. Anne From michael.s.gilbert at gmail.com Mon Dec 15 18:01:41 2008 From: michael.s.gilbert at gmail.com (Michael Gilbert) Date: Mon, 15 Dec 2008 18:01:41 -0500 Subject: [Numpy-discussion] Mersenne twister seeds Message-ID: <8e2a98be0812151501k377bdf66sa96612bf8c2bb247@mail.gmail.com> According to wikipedia [1], some common Mersenne twister algorithms use a linear congruential gradient (LCG) to generate seeds. LCGs have been known to produce poor random numbers. Does numpy's Mersenne twister do this? And if so, is this potentially a problem? http://en.wikipedia.org/wiki/Linear_congruential_generator From bioinformed at gmail.com Mon Dec 15 18:10:52 2008 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Mon, 15 Dec 2008 18:10:52 -0500 Subject: [Numpy-discussion] Mersenne twister seeds In-Reply-To: <8e2a98be0812151501k377bdf66sa96612bf8c2bb247@mail.gmail.com> References: <8e2a98be0812151501k377bdf66sa96612bf8c2bb247@mail.gmail.com> Message-ID: <2e1434c10812151510o2f621055t2320ee93357f6b43@mail.gmail.com> On Mon, Dec 15, 2008 at 6:01 PM, Michael Gilbert < michael.s.gilbert at gmail.com> wrote: > According to wikipedia [1], some common Mersenne twister algorithms > use a linear congruential gradient (LCG) to generate seeds. LCGs have > been known to produce poor random numbers. Does numpy's Mersenne > twister do this? And if so, is this potentially a problem? > It is certainly no worse than using sequential pids or timestamps and very likely considerably better. -Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists_ravi at lavabit.com Mon Dec 15 18:57:54 2008 From: lists_ravi at lavabit.com (Ravi) Date: Mon, 15 Dec 2008 18:57:54 -0500 Subject: [Numpy-discussion] Mersenne twister seeds In-Reply-To: <8e2a98be0812151501k377bdf66sa96612bf8c2bb247@mail.gmail.com> References: <8e2a98be0812151501k377bdf66sa96612bf8c2bb247@mail.gmail.com> Message-ID: <200812151857.56744.lists_ravi@lavabit.com> On Monday 15 December 2008 18:01:41 Michael Gilbert wrote: > According to wikipedia [1], some common Mersenne twister algorithms > use a linear congruential gradient (LCG) to generate seeds. ?LCGs have > been known to produce poor random numbers. ?Does numpy's Mersenne > twister do this? ?And if so, is this potentially a problem? No. Once the seeding is done, the Mersenne twister generates the random numbers. So long as you are using those, you are fine (except for cryptographic applications). If you don't trust the seed, you could always seed it yourself as well. Regards, Ravi From aisaac at american.edu Mon Dec 15 19:04:02 2008 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 15 Dec 2008 19:04:02 -0500 Subject: [Numpy-discussion] Mersenne twister seeds In-Reply-To: <8e2a98be0812151501k377bdf66sa96612bf8c2bb247@mail.gmail.com> References: <8e2a98be0812151501k377bdf66sa96612bf8c2bb247@mail.gmail.com> Message-ID: <4946F072.6050104@american.edu> On 12/15/2008 6:01 PM Michael Gilbert apparently wrote: > According to wikipedia [1], some common Mersenne twister algorithms > use a linear congruential gradient (LCG) to generate seeds. LCGs have > been known to produce poor random numbers. Does numpy's Mersenne > twister do this? And if so, is this potentially a problem? > > http://en.wikipedia.org/wiki/Linear_congruential_generator See the discussion of `seed` at http://www.python.org/doc/2.5/lib/module-random.html If you are just looking for a truly random seed: http://www.random.org/ Alan Isaac From drife at ucar.edu Mon Dec 15 19:24:05 2008 From: drife at ucar.edu (Daran Rife) Date: Mon, 15 Dec 2008 17:24:05 -0700 Subject: [Numpy-discussion] Efficient removal of duplicates Message-ID: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> How about a solution inspired by recipe 18.1 in the Python Cookbook, 2nd Ed: import numpy as np a = [(x0,y0), (x1,y1), ...] l = a.tolist() l.sort() unique = [x for i, x in enumerate(l) if not i or x != b[l-1]] a_unique = np.asarray(unique) Performance of this approach should be highly scalable. Daran -- Hi, I the following problem: I have a relatively long array of points [(x0,y0), (x1,y1), ...]. Apparently, I have some duplicate entries, which prevents the Delaunay triangulation algorithm from completing its task. Question, is there an efficent way, of getting rid of the duplicate entries? All I can think of involves loops. Thanks and regards, Hanno From drife at ucar.edu Mon Dec 15 19:27:16 2008 From: drife at ucar.edu (Daran Rife) Date: Mon, 15 Dec 2008 17:27:16 -0700 Subject: [Numpy-discussion] Efficient removal of duplicates In-Reply-To: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> Message-ID: Whoops! A hasty cut-and-paste from my IDLE session. This should read: import numpy as np a = [(x0,y0), (x1,y1), ...] # A numpy array, but could be a list l = a.tolist() l.sort() unique = [x for i, x in enumerate(l) if not i or x != l[i-1]] # <---- a_unique = np.asarray(unique) Daran -- On Dec 15, 2008, at 5:24 PM, Daran Rife wrote: > How about a solution inspired by recipe 18.1 in the Python Cookbook, > 2nd Ed: > > import numpy as np > > a = [(x0,y0), (x1,y1), ...] > l = a.tolist() > l.sort() > unique = [x for i, x in enumerate(l) if not i or x != b[l-1]] > a_unique = np.asarray(unique) > > Performance of this approach should be highly scalable. > > Daran > > -- > > > Hi, > > I the following problem: I have a relatively long array of points > [(x0,y0), (x1,y1), ...]. Apparently, I have some duplicate entries, > which > prevents the Delaunay triangulation algorithm from completing its > task. > > Question, is there an efficent way, of getting rid of the duplicate > entries? > All I can think of involves loops. > > Thanks and regards, > Hanno From robert.kern at gmail.com Mon Dec 15 19:53:02 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 15 Dec 2008 18:53:02 -0600 Subject: [Numpy-discussion] Efficient removal of duplicates In-Reply-To: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> Message-ID: <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> On Mon, Dec 15, 2008 at 18:24, Daran Rife wrote: > How about a solution inspired by recipe 18.1 in the Python Cookbook, > 2nd Ed: > > import numpy as np > > a = [(x0,y0), (x1,y1), ...] > l = a.tolist() > l.sort() > unique = [x for i, x in enumerate(l) if not i or x != b[l-1]] > a_unique = np.asarray(unique) > > Performance of this approach should be highly scalable. That basic idea is what unique1d() does; however, it uses numpy primitives to keep the heavy lifting in C instead of Python. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From aisaac at american.edu Mon Dec 15 21:21:56 2008 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 15 Dec 2008 21:21:56 -0500 Subject: [Numpy-discussion] unique1d docs (was: Efficient removal of duplicates) In-Reply-To: <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> Message-ID: <494710C4.9080808@american.edu> On 12/15/2008 7:53 PM Robert Kern apparently wrote: > That basic idea is what unique1d() does; however, it uses numpy > primitives to keep the heavy lifting in C instead of Python. I noticed that unique1d is not documented on the Numpy Example List http://www.scipy.org/Numpy_Example_List but is documented on the Numpy Example List with Doc http://www.scipy.org/Numpy_Example_List_With_Doc I thought the latter was auot-generated from the former, as stated at the top of the latter? Alan Isaac From josef.pktd at gmail.com Mon Dec 15 21:38:30 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 15 Dec 2008 21:38:30 -0500 Subject: [Numpy-discussion] unique1d docs (was: Efficient removal of duplicates) In-Reply-To: <494710C4.9080808@american.edu> References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> <494710C4.9080808@american.edu> Message-ID: <1cd32cbb0812151838x3d369aa2r8d3548e280f1fb69@mail.gmail.com> On Mon, Dec 15, 2008 at 9:21 PM, Alan G Isaac wrote: > On 12/15/2008 7:53 PM Robert Kern apparently wrote: >> That basic idea is what unique1d() does; however, it uses numpy >> primitives to keep the heavy lifting in C instead of Python. > > > > I noticed that unique1d is not documented on the > Numpy Example List http://www.scipy.org/Numpy_Example_List > but is documented on the Numpy Example List with Doc > http://www.scipy.org/Numpy_Example_List_With_Doc > > I thought the latter was auot-generated from the former, > as stated at the top of the latter? > > Alan Isaac > > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > I checked the changelog of example list with docs, and it seems that there were several edits directly on the example list with docs page. I guess the warning on top is not enough to prevent edits. Josef From aisaac at american.edu Mon Dec 15 22:18:01 2008 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 15 Dec 2008 22:18:01 -0500 Subject: [Numpy-discussion] unique1d docs In-Reply-To: <1cd32cbb0812151838x3d369aa2r8d3548e280f1fb69@mail.gmail.com> References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> <494710C4.9080808@american.edu> <1cd32cbb0812151838x3d369aa2r8d3548e280f1fb69@mail.gmail.com> Message-ID: <49471DE9.9000809@american.edu> > On Mon, Dec 15, 2008 at 9:21 PM, Alan G Isaac wrote: >> I noticed that unique1d is not documented on the >> Numpy Example List http://www.scipy.org/Numpy_Example_List >> but is documented on the Numpy Example List with Doc >> http://www.scipy.org/Numpy_Example_List_With_Doc >> I thought the latter was auto-generated from the former, >> as stated at the top of the latter? On 12/15/2008 9:38 PM josef.pktd at gmail.com apparently wrote: > I checked the changelog of example list with docs, and it seems that > there were several edits directly on the example list with docs page. > I guess the warning on top is not enough to prevent edits. Well I added the unique1d example to the Numpy Example List http://www.scipy.org/Numpy_Example_List. I hope that was the correct response. Alan From josef.pktd at gmail.com Mon Dec 15 23:37:33 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 15 Dec 2008 23:37:33 -0500 Subject: [Numpy-discussion] unique1d docs In-Reply-To: <49471DE9.9000809@american.edu> References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> <494710C4.9080808@american.edu> <1cd32cbb0812151838x3d369aa2r8d3548e280f1fb69@mail.gmail.com> <49471DE9.9000809@american.edu> Message-ID: <1cd32cbb0812152037x2ae25e9fw777a88197c8830e1@mail.gmail.com> On Mon, Dec 15, 2008 at 10:18 PM, Alan G Isaac wrote: >> On Mon, Dec 15, 2008 at 9:21 PM, Alan G Isaac wrote: >>> I noticed that unique1d is not documented on the >>> Numpy Example List http://www.scipy.org/Numpy_Example_List >>> but is documented on the Numpy Example List with Doc >>> http://www.scipy.org/Numpy_Example_List_With_Doc >>> I thought the latter was auto-generated from the former, >>> as stated at the top of the latter? > > > On 12/15/2008 9:38 PM josef.pktd at gmail.com apparently wrote: >> I checked the changelog of example list with docs, and it seems that >> there were several edits directly on the example list with docs page. >> I guess the warning on top is not enough to prevent edits. > > > Well I added the unique1d example to the Numpy Example List > http://www.scipy.org/Numpy_Example_List. I hope that was > the correct response. > > Alan > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > What's the future of the example list, on the example list with docs it says Numpy 1.0.4. It hasn't been updated in a while. When I started out with numpy, I used it as a main reference, but now, some examples, that I wanted to look at, had outdated function signature. For me, the new docs are now more usable than the example list. I was thinking of starting an example list for scipy.stats, but I guess the effort is better placed in improving the new docs. Josef From millman at berkeley.edu Tue Dec 16 01:29:40 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Mon, 15 Dec 2008 22:29:40 -0800 Subject: [Numpy-discussion] unique1d docs In-Reply-To: <1cd32cbb0812152037x2ae25e9fw777a88197c8830e1@mail.gmail.com> References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> <494710C4.9080808@american.edu> <1cd32cbb0812151838x3d369aa2r8d3548e280f1fb69@mail.gmail.com> <49471DE9.9000809@american.edu> <1cd32cbb0812152037x2ae25e9fw777a88197c8830e1@mail.gmail.com> Message-ID: On Mon, Dec 15, 2008 at 8:37 PM, wrote: > What's the future of the example list, on the example list with docs > it says Numpy 1.0.4. It hasn't been updated in a while. When I started > out with numpy, I used it as a main reference, but now, some examples, > that I wanted to look at, had outdated function signature. At some point, we should make sure everything is in the new docs. Maybe we should lock down the pages for editing, point everyone to the new docs.scipy.org webpage, and then eventually make sure everything is the new docs and remove the old pages. > For me, the new docs are now more usable than the example list. I was > thinking of starting an example list for scipy.stats, but I guess the > effort is better placed in improving the new docs. Yes. Please don't start new moin wiki documentation. We have a good solution for documentation that didn't exist when the moin documentation was started. Either put new docs in the docstrings or in the scipy tutorial. Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ From klemm at phys.ethz.ch Tue Dec 16 04:09:33 2008 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Tue, 16 Dec 2008 10:09:33 +0100 Subject: [Numpy-discussion] Efficient removal of duplicates In-Reply-To: References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu>, <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> Message-ID: Thanks Daran, that works like a charm! Hanno On Tue, Dec 16, 2008, Daran Rife said: > Whoops! A hasty cut-and-paste from my IDLE session. > This should read: > > import numpy as np > > a = [(x0,y0), (x1,y1), ...] # A numpy array, but could be a list > l = a.tolist() > l.sort() > unique = [x for i, x in enumerate(l) if not i or x != l[i-1]] # <---- > a_unique = np.asarray(unique) > > > Daran > > -- > > On Dec 15, 2008, at 5:24 PM, Daran Rife wrote: > >> How about a solution inspired by recipe 18.1 in the Python Cookbook, >> 2nd Ed: >> >> import numpy as np >> >> a = [(x0,y0), (x1,y1), ...] >> l = a.tolist() >> l.sort() >> unique = [x for i, x in enumerate(l) if not i or x != b[l-1]] >> a_unique = np.asarray(unique) >> >> Performance of this approach should be highly scalable. >> >> Daran >> >> -- >> >> >> Hi, >> >> I the following problem: I have a relatively long array of points >> [(x0,y0), (x1,y1), ...]. Apparently, I have some duplicate entries, >> which >> prevents the Delaunay triangulation algorithm from completing its >> task. >> >> Question, is there an efficent way, of getting rid of the duplicate >> entries? >> All I can think of involves loops. >> >> Thanks and regards, >> Hanno > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- Hanno Klemm klemm at phys.ethz.ch From sturla at molden.no Tue Dec 16 07:24:52 2008 From: sturla at molden.no (Sturla Molden) Date: Tue, 16 Dec 2008 13:24:52 +0100 (CET) Subject: [Numpy-discussion] Efficient removal of duplicates In-Reply-To: <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> Message-ID: There was an discussion about this on the c.l.p a while ago. Using a sort will scale like O(n log n) or worse, whereas using a set (hash table) will scale like amortized O(n). How to use a Python set to get a unique collection of objects I'll leave to your imagination. Sturla Molden > On Mon, Dec 15, 2008 at 18:24, Daran Rife wrote: >> How about a solution inspired by recipe 18.1 in the Python Cookbook, >> 2nd Ed: >> >> import numpy as np >> >> a = [(x0,y0), (x1,y1), ...] >> l = a.tolist() >> l.sort() >> unique = [x for i, x in enumerate(l) if not i or x != b[l-1]] >> a_unique = np.asarray(unique) >> >> Performance of this approach should be highly scalable. > > That basic idea is what unique1d() does; however, it uses numpy > primitives to keep the heavy lifting in C instead of Python. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From aisaac at american.edu Tue Dec 16 10:01:12 2008 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 16 Dec 2008 10:01:12 -0500 Subject: [Numpy-discussion] finding docs (was: unique1d docs) In-Reply-To: References: <9A3406A3-1B29-464A-B4FD-AB8DE8D96138@ucar.edu> <3d375d730812151653r5c82bfc7g6cedc7ccd309a926@mail.gmail.com> <494710C4.9080808@american.edu> <1cd32cbb0812151838x3d369aa2r8d3548e280f1fb69@mail.gmail.com> <49471DE9.9000809@american.edu> <1cd32cbb0812152037x2ae25e9fw777a88197c8830e1@mail.gmail.com> Message-ID: <4947C2B8.3040903@american.edu> On 12/16/2008 1:29 AM Jarrod Millman apparently wrote: > Yes. Please don't start new moin wiki documentation. We have a good > solution for documentation that didn't exist when the moin > documentation was started. Either put new docs in the docstrings or > in the scipy tutorial. OK, in this case I think the main NumPy needs a change: an explicit link to the new docs, and a section titled "Documentation" (linked in the contents), and an explict link to the new Numpy Reference Guide. As far as I can tell, I have no way to edit this page: http://numpy.scipy.org/ Imagine a new user looking for docs. This is what I think they would do. 1. Use `numpy` as a browser search term, and get directed to http://numpy.scipy.org/ 2. Notice no "Docmentation" link in contents. *Maybe* notice that "Download the Guide" means get some documentation, but probably that is more detailed and encyclopedic than many are first seeking. 3. Perhaps they will read the text and get pointed to the Numeric docs. Nothing will point them to the new docs. They may notice "Other Documentation is available at the scipy website" but if they follow that, will they guess that they should try a "snapshot" of a "work in progress"? Alan Isaac From rmay31 at gmail.com Tue Dec 16 13:57:40 2008 From: rmay31 at gmail.com (Ryan May) Date: Tue, 16 Dec 2008 12:57:40 -0600 Subject: [Numpy-discussion] Unexpected MaskedArray behavior Message-ID: <4947FA24.3030904@gmail.com> Hi, I just noticed the following and I was kind of surprised: >>>a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False]) >>>b = a*5 >>>b masked_array(data = [5 -- -- 20 25], mask = [False True True False False], fill_value=999999) >>>b.data array([ 5, 10, 15, 20, 25]) I was expecting that the underlying data wouldn't get modified while masked. Is this actual behavior expected? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From shao at msg.ucsf.edu Tue Dec 16 15:09:45 2008 From: shao at msg.ucsf.edu (Lin Shao) Date: Tue, 16 Dec 2008 12:09:45 -0800 Subject: [Numpy-discussion] error compiling umathmodule.c (numpy 1.3) on 64-bit windows xp In-Reply-To: References: Message-ID: Hi, I found this earlier dialog about refactoring umathmodule.c (see bottom) where David mentioned it wasn't tested on 64-bit Windows. I tried compiling numpy 1.3.0.dev6118 on both a 32-bit and 64-bit Windows for Python 2.6.1 with VS 9.0, and not surprisingly, it worked on 32-bit but not on 64-bit: the compiler returned a non-specific "Internal Compiler Error" when working on umathmodule.c: ...... building 'numpy.core.umath' extension compiling C sources creating build\temp.win-amd64-2.6\Release\build creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6 creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy\core creating build\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy\core\src D:\Program Files\Microsoft Visual Studio 9.0\VC\BIN\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ibuild\src.win-amd64-2.6\numpy\core\src -Inumpy\core\include -Ibuild\src.win-amd64-2.6\numpy\core\include/numpy -Inumpy\core\src -Inumpy\core\include -ID:\Python26\include -ID:\Python26\PC /Tcbuild\src.win-amd64-2.6\numpy\core\src\umathmodule.c /Fobuild\temp.win-amd64-2.6\Release\build\src.win-amd64-2.6\numpy\core\src\umathmodule.obj umathmodule.c numpy\core\src\umath_funcs_c99.inc.src(140) : warning C4273: '_hypot' : inconsistent dll linkage D:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\math.h(139) : see previous definition of '_hypot' numpy\core\src\umath_funcs_c99.inc.src(341) : warning C4273: 'sinf' : inconsistent dll linkage Internal Compiler Error in D:\Program Files\Microsoft Visual Studio 9.0\VC\BIN\amd64\cl.exe. You will be prompted to send an error report to Microsoft later. Any idea what's going on? I'd like to volunteer to test compiling numpy on 64-bit Windows system since I have a VS 2008 professional edition installed. Thanks! --lin 2008/10/5 David Cournapeau : >> ...... >> >> > >> > #ifndef HAVE_FREXPF >> > static float frexpf(float x, int * i) >> > { >> > return (float)frexp((double)(x), i); >> > } >> > #endif >> > #ifndef HAVE_LDEXPF >> > static float ldexpf(float x, int i) >> > { >> > return (float)ldexp((double)(x), i); >> > } >> > #endif >> >> At the time I had tried to send further output following a checkout, >> but couldn't get it to post to the list, I think the message was too >> big or something. I will probably be having a go with 1.2.0, when I >> get some time. I'll let you know how it goes. > > I did some heavy refactoring for the above problems, and it should be > now easier to handle (in the trunk). I could build 1.2.0 with VS 2008 > express on 32 bits (wo blas/lapack), and there are some test errors - > albeit relatively minor at first sight. I have not tried on 64 bits. > > cheers, > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From rmay31 at gmail.com Tue Dec 16 18:07:14 2008 From: rmay31 at gmail.com (Ryan May) Date: Tue, 16 Dec 2008 17:07:14 -0600 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> Message-ID: <494834A2.5030303@gmail.com> Pierre GM wrote: > All, > Here's the latest version of genloadtxt, with some recent corrections. > With just a couple of tweaking, we end up with some decent speed: it's > still slower than np.loadtxt, but only 15% so according to the test at > the end of the package. I have one more use issue that you may or may not want to fix. My problem is that missing "values" are specified by their string representation, so that a string representing a missing value, while having the same actual numeric value, may not compare equal when represented as a string. For instance, if you specify that -999.0 represents a missing value, but the value written to the file is -999.00, you won't end up masking the -999.00 data point. I'm sure a test case will help here: def test_withmissing_float(self): data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00') test = mloadtxt(data, dtype=None, delimiter=',', missing='-999.0', names=True) control = ma.array([(0, 1.5), (2, -1.)], mask=[(False, False), (False, True)], dtype=[('A', np.int), ('B', np.float)]) print control print test assert_equal(test, control) assert_equal(test.mask, control.mask) Right now this fails with the latest version of genloadtxt. I've worked around this by specifying a whole bunch of string representations of the values, but I wasn't sure if you knew of a better way that this could be handled within genloadtxt. I can only think of two ways, though I'm not thrilled with either: 1) Call the converter on the string form of the missing value and compare against the converted value from the file to determine if missing. (Probably very slow) 2) Add a list of objects (ints, floats, etc.) to compare against after conversion to determine if they're missing. This might needlessly complicate the function, which I know you've already taken pains to optimize. If there's no good way to do it, I'm content to live with a workaround. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From pgmdevlist at gmail.com Tue Dec 16 18:34:13 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 16 Dec 2008 18:34:13 -0500 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: <494834A2.5030303@gmail.com> References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> <494834A2.5030303@gmail.com> Message-ID: <85E86994-558E-443E-B3C0-E3FDE6BFB940@gmail.com> Ryan, OK, I'll look into that. I won't have time to address it before this next week, however. Option #2 looks like the best. In other news, I was considering renaming genloadtxt to genfromtxt, and using ndfromtxt, mafromtxt, recfromtxt, recfromcsv for the function names. That way, loadtxt is untouched. On Dec 16, 2008, at 6:07 PM, Ryan May wrote: > Pierre GM wrote: >> All, >> Here's the latest version of genloadtxt, with some recent >> corrections. >> With just a couple of tweaking, we end up with some decent speed: >> it's >> still slower than np.loadtxt, but only 15% so according to the test >> at >> the end of the package. > > I have one more use issue that you may or may not want to fix. My > problem is that > missing "values" are specified by their string representation, so > that a string > representing a missing value, while having the same actual numeric > value, may not > compare equal when represented as a string. For instance, if you > specify that > -999.0 represents a missing value, but the value written to the file > is -999.00, > you won't end up masking the -999.00 data point. I'm sure a test > case will help > here: > > def test_withmissing_float(self): > data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00') > test = mloadtxt(data, dtype=None, delimiter=',', > missing='-999.0', > names=True) > control = ma.array([(0, 1.5), (2, -1.)], > mask=[(False, False), (False, True)], > dtype=[('A', np.int), ('B', np.float)]) > print control > print test > assert_equal(test, control) > assert_equal(test.mask, control.mask) > > Right now this fails with the latest version of genloadtxt. I've > worked around > this by specifying a whole bunch of string representations of the > values, but I > wasn't sure if you knew of a better way that this could be handled > within > genloadtxt. I can only think of two ways, though I'm not thrilled > with either: > > 1) Call the converter on the string form of the missing value and > compare against > the converted value from the file to determine if missing. (Probably > very slow) > > 2) Add a list of objects (ints, floats, etc.) to compare against > after conversion > to determine if they're missing. This might needlessly complicate > the function, > which I know you've already taken pains to optimize. > > If there's no good way to do it, I'm content to live with a > workaround. > > Ryan > > -- > Ryan May > Graduate Research Assistant > School of Meteorology > University of Oklahoma > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From pgmdevlist at gmail.com Tue Dec 16 18:48:52 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 16 Dec 2008 18:48:52 -0500 Subject: [Numpy-discussion] Unexpected MaskedArray behavior In-Reply-To: <4947FA24.3030904@gmail.com> References: <4947FA24.3030904@gmail.com> Message-ID: <90CABDB1-756E-42A6-8940-8BB42AC2DD88@gmail.com> On Dec 16, 2008, at 1:57 PM, Ryan May wrote: > I just noticed the following and I was kind of surprised: > >>>> a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False]) >>>> b = a*5 >>>> b > masked_array(data = [5 -- -- 20 25], > mask = [False True True False False], > fill_value=999999) >>>> b.data > array([ 5, 10, 15, 20, 25]) > > I was expecting that the underlying data wouldn't get modified while > masked. Is > this actual behavior expected? Meh. Masked data shouldn't be trusted anyway, so I guess it doesn't really matter one way or the other. But I tend to agree, it'd make more sense leave masked data untouched (or at least, reset them to their original value after the operation), which would mimic the behavior of gimp/photoshop. Looks like there's a relatively easy fix. I need time to check whether it doesn't break anything elsewhere, nor that it slows things down too much. I won't have time to test all that before next week, though. In any case, that would be for 1.3.x, not for 1.2.x. In the meantime, if you need the functionality, use something like ma.where(a.mask,a,a*5) From cournape at gmail.com Tue Dec 16 21:17:26 2008 From: cournape at gmail.com (David Cournapeau) Date: Wed, 17 Dec 2008 11:17:26 +0900 Subject: [Numpy-discussion] error compiling umathmodule.c (numpy 1.3) on 64-bit windows xp In-Reply-To: References: Message-ID: <5b8d13220812161817j2aae55cfleff57bd874589b95@mail.gmail.com> On Wed, Dec 17, 2008 at 5:09 AM, Lin Shao wrote: > Hi, > > I found this earlier dialog about refactoring umathmodule.c (see > bottom) where David mentioned it wasn't tested on 64-bit Windows. > > I tried compiling numpy 1.3.0.dev6118 on both a 32-bit and 64-bit > Windows for Python 2.6.1 with VS 9.0, and not surprisingly, it worked > on 32-bit but not on 64-bit: the compiler returned a non-specific > "Internal Compiler Error" when working on umathmodule.c: It is a bug in VS, but the problem is caused by buggy code in numpy, so this can be avoided. Incidentally, I was working on it yesterday, but went to bed before having fixed everything :) David From david at ar.media.kyoto-u.ac.jp Tue Dec 16 22:59:18 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 17 Dec 2008 12:59:18 +0900 Subject: [Numpy-discussion] Recent umath changes Message-ID: <49487916.3010804@ar.media.kyoto-u.ac.jp> Hi, There have been some changes recently in the umath code, which breaks windows 64 compilation - and I don't understand their rationale either. I have myself spent quite a good deal of time to make sure this works on many platforms/toolchains, by fixing the config distutils command and that platform specificities are contained in a very localized part of the code. It may not be very well documented (see below), but may I ask that next time someone wants to change file file, people ask for review before putting it directly in the trunk ? thanks, David How to deal with platform oddities: ----------------------------------- Basically, the code to replace missing C99 math funcs is, for an hypothetical double foo(double) function: #ifndef HAVE_FOO #udnef foo static double npy_foo(double a) { // define a npy_foo function with the same requirements as C99 foo } #define npy_foo foo #else double foo(double); #endif I think this code is wrong on several accounts: - we should not undef foo if foo is available: if foo is available at that point, it is a bug in the configuration, and should not be dealt in the code. Some cases may be complicated (IEEE754-related macro which are sometimes macro, something functions, etc...), but that should be dealt in very narrow cases. - we should not declare our own function: function declaration is not portable, and varies among OS/toolchains. Some toolchains use intrinsic, some non standard inline mechanism, etc... which can crash the resulting binary because there is a discrepency between our code calling conventions and the library convention. The reported problem with VS compiler on amd64 is caused by this exact problem. Unless there is a strong rationale otherwise, I would like that we follow how "autoconfed" projects do. They have long experience on dealing with platforms idiosyncrasies, and the above method is not the one they follow. They follow the simple: #ifnfdef HAVE_FOO //define foo #endif And deal with platform oddities in the *configuration* code instead of directly in the code. That really makes my life easier when I deal with windows compilers, which are already painful enough to deal with as it is. From charlesr.harris at gmail.com Wed Dec 17 00:43:05 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 16 Dec 2008 22:43:05 -0700 Subject: [Numpy-discussion] Recent umath changes In-Reply-To: <49487916.3010804@ar.media.kyoto-u.ac.jp> References: <49487916.3010804@ar.media.kyoto-u.ac.jp> Message-ID: On Tue, Dec 16, 2008 at 8:59 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Hi, > > There have been some changes recently in the umath code, which > breaks windows 64 compilation - and I don't understand their rationale > either. I have myself spent quite a good deal of time to make sure this > works on many platforms/toolchains, by fixing the config distutils > command and that platform specificities are contained in a very > localized part of the code. It may not be very well documented (see > below), but may I ask that next time someone wants to change file file, > people ask for review before putting it directly in the trunk ? > > thanks, > > David > > > How to deal with platform oddities: > ----------------------------------- > > Basically, the code to replace missing C99 math funcs is, for an > hypothetical double foo(double) function: > > #ifndef HAVE_FOO > #udnef foo > static double npy_foo(double a) > { > // define a npy_foo function with the same requirements as C99 foo > } > > #define npy_foo foo > #else > double foo(double); > #endif > > I think this code is wrong on several accounts: > - we should not undef foo if foo is available: if foo is available at > that point, it is a bug in the configuration, and should not be dealt in > the code. Some cases may be complicated (IEEE754-related macro which are > sometimes macro, something functions, etc...), but that should be dealt > in very narrow cases. > - we should not declare our own function: function declaration is not > portable, and varies among OS/toolchains. Some toolchains use intrinsic, > some non standard inline mechanism, etc... which can crash the resulting > binary because there is a discrepency between our code calling > conventions and the library convention. The reported problem with VS > compiler on amd64 is caused by this exact problem. > > Unless there is a strong rationale otherwise, I would like that we > follow how "autoconfed" projects do. They have long experience on > dealing with platforms idiosyncrasies, and the above method is not the > one they follow. They follow the simple: > Yes, the rational was to fix compilation on windows 64 with msvc and etch on SPARC, both of which were working after the changes. You are, of course, free to break these builds again. However, I designated space at the top of the file for compiler/distro specific defines, I think you should use them, there is a reason other folks do. The macro undef could be moved but I preferred to generate an error if there was a conflict with the the standard c function prototypes. We can't use inline code for these functions as they are passed to the generic loops as function pointers. I assume compilers have some way of recognizing this case and perhaps generating function code on the fly. If so, we need to figure out how to detect that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Wed Dec 17 00:56:43 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 17 Dec 2008 14:56:43 +0900 Subject: [Numpy-discussion] Recent umath changes In-Reply-To: References: <49487916.3010804@ar.media.kyoto-u.ac.jp> Message-ID: <4948949B.9000504@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On Tue, Dec 16, 2008 at 8:59 PM, David Cournapeau > > > wrote: > > Hi, > > There have been some changes recently in the umath code, which > breaks windows 64 compilation - and I don't understand their rationale > either. I have myself spent quite a good deal of time to make sure > this > works on many platforms/toolchains, by fixing the config distutils > command and that platform specificities are contained in a very > localized part of the code. It may not be very well documented (see > below), but may I ask that next time someone wants to change file > file, > people ask for review before putting it directly in the trunk ? > > thanks, > > David > > > How to deal with platform oddities: > ----------------------------------- > > Basically, the code to replace missing C99 math funcs is, for an > hypothetical double foo(double) function: > > #ifndef HAVE_FOO > #udnef foo > static double npy_foo(double a) > { > // define a npy_foo function with the same requirements as C99 foo > } > > #define npy_foo foo > #else > double foo(double); > #endif > > I think this code is wrong on several accounts: > - we should not undef foo if foo is available: if foo is available at > that point, it is a bug in the configuration, and should not be > dealt in > the code. Some cases may be complicated (IEEE754-related macro > which are > sometimes macro, something functions, etc...), but that should be > dealt > in very narrow cases. > - we should not declare our own function: function declaration is not > portable, and varies among OS/toolchains. Some toolchains use > intrinsic, > some non standard inline mechanism, etc... which can crash the > resulting > binary because there is a discrepency between our code calling > conventions and the library convention. The reported problem with VS > compiler on amd64 is caused by this exact problem. > > Unless there is a strong rationale otherwise, I would like that we > follow how "autoconfed" projects do. They have long experience on > dealing with platforms idiosyncrasies, and the above method is not the > one they follow. They follow the simple: > > > Yes, the rational was to fix compilation on windows 64 with msvc and > etch on SPARC, both of which were working after the changes. It does not work at the moment on windows at least :) But more essentially, I don't see why you declared those functions: can you explain me what was your intention, because I don't understand the rationale. > You are, of course, free to break these builds again. However, I > designated space at the top of the file for compiler/distro specific > defines, I think you should use them, there is a reason other folks do. The problem is two folds: - by declaring functions everywhere in the code, you are effectively spreading toolchain specific oddities in the whole source file. This is not good, IMHO: those should be detected at configuration stage, and dealt in the source code using those informations. That's how every autoconf project does it. If a function is actually a macro, this should be detected at configuration. - declarations are toolchain specific; it is actually worse, it even depends on the compiler flags. It is at least the case with MS compilers. So there is no way to guarantee that your declaration matches the math runtime one (the compiler crash reported is exactly caused by this). > The macro undef could be moved but I preferred to generate an error if > there was a conflict with the the standard c function prototypes. > > We can't use inline code for these functions as they are passed to the > generic loops as function pointers. Yes, I believe this is another problem when declaring function: if we use say cosl, and cosl is an inline function in the runtime, by re-declaring it, you are telling the compiler that it is not inline anymore, so the compiler does not know anymore you can't take the address of cosl, unless it detects the mismatch between the runtime declaration and ours, and considers it as an error (I am not sure whether this is always an error with MS compilers; it may only be a warning on some versions - it is certainly not dealt in a gracious manner every time, since the linker crashes in some cases). David > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Wed Dec 17 03:40:44 2008 From: cournape at gmail.com (David Cournapeau) Date: Wed, 17 Dec 2008 17:40:44 +0900 Subject: [Numpy-discussion] Win64 build? In-Reply-To: <4945DC5B.8040904@ar.media.kyoto-u.ac.jp> References: <49455454.1070303@gmx.de> <4945C402.4080209@ar.media.kyoto-u.ac.jp> <4945DC5B.8040904@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220812170040r66d08ee6q847b48148328a422@mail.gmail.com> On Mon, Dec 15, 2008 at 1:26 PM, David Cournapeau wrote: > Christian Heimes wrote: >> David Cournapeau schrieb: >> >>> Do you only need numpy or also scipy ? If you only need numpy, it is >>> relatively straightforward because you don't need BLAS/LAPACK nor any >>> fortran compiler. You should use the Visual Studio compiler, though: VS >>> 2005 for python 2.5 or VS 2008 for python 2.6 - mingw does not work well >>> yet for 64 bits. >>> >> >> The offical Windows builds of Python 2.5 are created with Visual C 7.1 >> (also known as VS2003). You can compile an extension with VS 2005 but >> that will cause trouble. >> > > Hm, I may have got confused between the IDE and the compiler version. > VS2003 cannot build 64 bits binaries, right ? So you need the > Platform/Windows SDK - which corresponds to the compiler version 14 (VS > 2005) and not 13 (VS 2003), right ? For the record, if anyone (including me) needs this info: I checked, and python 2.5.2 on amd64 is indeed build by a compiler reporting MSC 1400 (VS 2005 serie). I don't think VS 2003 compiler is used at all, actually - maybe the VS 2003 IDE can be set to use the SDK compilers, though. cheers, David From gwg at emss.co.za Wed Dec 17 04:52:36 2008 From: gwg at emss.co.za (George) Date: Wed, 17 Dec 2008 09:52:36 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Singular_Matrix_problem_with_Matplit?= =?utf-8?q?lib_in=09Numpy_=28Windows_-_AMD64=29?= References: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> <5b8d13220812080943g69d4c670jabd6aef66d336e29@mail.gmail.com> Message-ID: David Cournapeau gmail.com> writes: > > On Tue, Dec 9, 2008 at 12:50 AM, George Goussard emss.co.za> wrote: > > Hello. > > > > > > > > I have been battling with the following error for the past week. The output > > from the terminal is: > > > > What does numpy.test() says ? Did you use an external blas/lapack when > you built numpy for AMD64 > > David > Hi David. I accidentally created a new posting previously. I have spent the last month trying to track down this bug. I am trying to compile Numpy and Matplotlib on Windows XP 64-bit. I am using the Visual Studio 2005 compiler. Everything compiles without a problem. However running matplotlib etc. gave me a lot of problems: 1. The interaction was terrible. It didn't draw anything and the console had a lot trace output with regard to singular matrices etc. Like you said, I am using an external library called Intel MKL and I decided to swap this with AMD ACML. Then the interaction was a lot better and no trace on the console of singular matrices etc. 2. Using both libraries there are problems with the plotting. In both cases the graphs are broken. It starts plotting the curve and then it stops with a section of white space and then some more of the curve etc. The same with the grid lines etc. In other words there is just something broken. I have decided to pursue this bug. I would really like to get Numpy working on AMD64. I ran the test you advised and the tests passed. However I have traced the problem to the file lines.py of matplotlib. There in a function set_xdata and set_ydata(also set_data) there is a line like x = np.asarray or y = np.asarray. My data before that line is fine, but straight after the line is executed the data is broken and garbage. I have debugged some more but I am in deep (murky) waters, but I have also ran out of ideas. If anybody has some more suggestions, please post them. From michael.abshoff at googlemail.com Wed Dec 17 05:15:21 2008 From: michael.abshoff at googlemail.com (Michael Abshoff) Date: Wed, 17 Dec 2008 02:15:21 -0800 Subject: [Numpy-discussion] Singular Matrix problem with Matplitlib in Numpy (Windows - AMD64) In-Reply-To: References: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> <5b8d13220812080943g69d4c670jabd6aef66d336e29@mail.gmail.com> Message-ID: <4948D139.4020703@gmail.com> George wrote: > David Cournapeau gmail.com> writes: Hi George, > I have debugged some more but I am in deep (murky) waters, but I have also ran > out of ideas. If anybody has some more suggestions, please post them. Could you post a full example with additional version info that you are using? Ever since Sage upgraded to Matplotlib 0.98.3 I have been seeing issues with uninitilized values being used in certain code paths. This could be the source of potential trouble even though it doesn't seem to cause any observable trouble with gcc for example. Cheers, Michael > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From rmay31 at gmail.com Wed Dec 17 11:51:57 2008 From: rmay31 at gmail.com (Ryan May) Date: Wed, 17 Dec 2008 10:51:57 -0600 Subject: [Numpy-discussion] genloadtxt : last call In-Reply-To: <85E86994-558E-443E-B3C0-E3FDE6BFB940@gmail.com> References: <2611118B-4B2F-4E86-A862-4D96250C5297@gmail.com> <494834A2.5030303@gmail.com> <85E86994-558E-443E-B3C0-E3FDE6BFB940@gmail.com> Message-ID: <49492E2D.4080609@gmail.com> Pierre GM wrote: > Ryan, > OK, I'll look into that. I won't have time to address it before this > next week, however. Option #2 looks like the best. No hurries, I just want to make sure I raise any issues I see while the design is still up for change. > In other news, I was considering renaming genloadtxt to genfromtxt, > and using ndfromtxt, mafromtxt, recfromtxt, recfromcsv for the > function names. That way, loadtxt is untouched. +1 I know I've changed my tune here, but at this point it seems like there's so much more functionality here that calling it loadtxt would be a disservice to how much the new function can do (and how much work you've done). Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From rmay31 at gmail.com Wed Dec 17 11:57:03 2008 From: rmay31 at gmail.com (Ryan May) Date: Wed, 17 Dec 2008 10:57:03 -0600 Subject: [Numpy-discussion] Unexpected MaskedArray behavior In-Reply-To: <90CABDB1-756E-42A6-8940-8BB42AC2DD88@gmail.com> References: <4947FA24.3030904@gmail.com> <90CABDB1-756E-42A6-8940-8BB42AC2DD88@gmail.com> Message-ID: <49492F5F.5050001@gmail.com> Pierre GM wrote: > On Dec 16, 2008, at 1:57 PM, Ryan May wrote: >> I just noticed the following and I was kind of surprised: >> >>>>> a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False]) >>>>> b = a*5 >>>>> b >> masked_array(data = [5 -- -- 20 25], >> mask = [False True True False False], >> fill_value=999999) >>>>> b.data >> array([ 5, 10, 15, 20, 25]) >> >> I was expecting that the underlying data wouldn't get modified while >> masked. Is >> this actual behavior expected? > > Meh. Masked data shouldn't be trusted anyway, so I guess it doesn't > really matter one way or the other. > But I tend to agree, it'd make more sense leave masked data untouched > (or at least, reset them to their original value after the operation), > which would mimic the behavior of gimp/photoshop. > Looks like there's a relatively easy fix. I need time to check whether > it doesn't break anything elsewhere, nor that it slows things down too > much. I won't have time to test all that before next week, though. In > any case, that would be for 1.3.x, not for 1.2.x. > In the meantime, if you need the functionality, use something like > ma.where(a.mask,a,a*5) I agree that masked values probably shouldn't be trusted, I was just surprised to see the behavior. I just assumed that no operations were taking place on masked values. Just to clarify what I was doing here: I had a masked array of data, where the mask was set by a variety of different masked values. Later on in the code, after doing some unit conversions, I went back to look at the raw data to find points that had one particular masked value set. Instead, I was surprised to see all of the masked values had changed and I could no longer find any of the special values in the data. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From Jim.Vickroy at noaa.gov Wed Dec 17 12:13:45 2008 From: Jim.Vickroy at noaa.gov (Jim Vickroy) Date: Wed, 17 Dec 2008 10:13:45 -0700 Subject: [Numpy-discussion] Unexpected MaskedArray behavior In-Reply-To: <49492F5F.5050001@gmail.com> References: <4947FA24.3030904@gmail.com> <90CABDB1-756E-42A6-8940-8BB42AC2DD88@gmail.com> <49492F5F.5050001@gmail.com> Message-ID: <49493349.3090506@noaa.gov> Ryan May wrote: > Pierre GM wrote: > >> On Dec 16, 2008, at 1:57 PM, Ryan May wrote: >> >>> I just noticed the following and I was kind of surprised: >>> >>> >>>>>> a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False]) >>>>>> b = a*5 >>>>>> b >>>>>> >>> masked_array(data = [5 -- -- 20 25], >>> mask = [False True True False False], >>> fill_value=999999) >>> >>>>>> b.data >>>>>> >>> array([ 5, 10, 15, 20, 25]) >>> >>> I was expecting that the underlying data wouldn't get modified while >>> masked. Is >>> this actual behavior expected? >>> >> Meh. Masked data shouldn't be trusted anyway, so I guess it doesn't >> really matter one way or the other. >> But I tend to agree, it'd make more sense leave masked data untouched >> (or at least, reset them to their original value after the operation), >> which would mimic the behavior of gimp/photoshop. >> Looks like there's a relatively easy fix. I need time to check whether >> it doesn't break anything elsewhere, nor that it slows things down too >> much. I won't have time to test all that before next week, though. In >> any case, that would be for 1.3.x, not for 1.2.x. >> In the meantime, if you need the functionality, use something like >> ma.where(a.mask,a,a*5) >> > > I agree that masked values probably shouldn't be trusted, I was just surprised to > see the behavior. I just assumed that no operations were taking place on masked > values. > > Just to clarify what I was doing here: I had a masked array of data, where the > mask was set by a variety of different masked values. Later on in the code, > after doing some unit conversions, I went back to look at the raw data to find > points that had one particular masked value set. Instead, I was surprised to see > all of the masked values had changed and I could no longer find any of the > special values in the data. > > Ryan > > Sorry for being dense about this, but I really do not understand why masked values should not be trusted. If I apply a procedure to an array with elements designated as untouchable, I would expect that contract to be honored. What am I missing here? Thanks for your patience! -- jv -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed Dec 17 12:45:14 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 17 Dec 2008 12:45:14 -0500 Subject: [Numpy-discussion] Unexpected MaskedArray behavior In-Reply-To: <49493349.3090506@noaa.gov> References: <4947FA24.3030904@gmail.com> <90CABDB1-756E-42A6-8940-8BB42AC2DD88@gmail.com> <49492F5F.5050001@gmail.com> <49493349.3090506@noaa.gov> Message-ID: <2BF975C5-74E0-45A8-A0E3-D3E4ADCDA4A3@gmail.com> On Dec 17, 2008, at 12:13 PM, Jim Vickroy wrote: >> > Sorry for being dense about this, but I really do not understand why > masked values should not be trusted. If I apply a procedure to an > array with elements designated as untouchable, I would expect that > contract to be honored. What am I missing here? > > Thanks for your patience! > -- jv Everything depends on your interpretation of masked data. Traditionally, masked data indicate invalid data, whatever the cause of the invalidity. Operations involving invalid data yield invalid data, hence the presence of a mask on the result. However, the value underneath the mask is still invalid, hence the statement "don't trust masked values". Interpreting a mask as a way to prevent some elements of an array to be processed (designating them as untouchable) is a bit of a stretch. Nevertheless, I agree that this behavior is not intuitive, so I'll check what I can do. From charlesr.harris at gmail.com Wed Dec 17 13:40:37 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 17 Dec 2008 11:40:37 -0700 Subject: [Numpy-discussion] Recent umath changes In-Reply-To: <4948949B.9000504@ar.media.kyoto-u.ac.jp> References: <49487916.3010804@ar.media.kyoto-u.ac.jp> <4948949B.9000504@ar.media.kyoto-u.ac.jp> Message-ID: On Tue, Dec 16, 2008 at 10:56 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Charles R Harris wrote: > > > > > > On Tue, Dec 16, 2008 at 8:59 PM, David Cournapeau > > > > > wrote: > > > It does not work at the moment on windows at least :) But more > essentially, I don't see why you declared those functions: can you > explain me what was your intention, because I don't understand the > rationale. > The declarations were for the SPARC. Originally I had them up in an ifdef up top, but I got curious what different machines would do. They shouldn't cause a problem unless something is pretty strange. The undefs I put where they are for similar reasons, but there was a strong temptation to move them into the if statement where they used to be. Let's say curiousity got the best of me there. They shouldn't affect anything but macros and I didn't want the function declarations do be interpreted as macros. > > > You are, of course, free to break these builds again. However, I > > designated space at the top of the file for compiler/distro specific > > defines, I think you should use them, there is a reason other folks do. > > The problem is two folds: > - by declaring functions everywhere in the code, you are effectively > spreading toolchain specific oddities in the whole source file. This is > not good, IMHO: those should be detected at configuration stage, and > dealt in the source code using those informations. That's how every > autoconf project does it. If a function is actually a macro, this should > be detected at configuration. > - declarations are toolchain specific; it is actually worse, it even > depends on the compiler flags. It is at least the case with MS > compilers. So there is no way to guarantee that your declaration matches > the math runtime one (the compiler crash reported is exactly caused by > this). > Worth knowing ;) It works on the windows buildbot but that is running python 2.4. Speaking of which, the BSD buildbot needs nose (I don't know what happened to it), the windows box is showing the same old permissions problem, and one of the SPARC buildbots just times out unless you build during the right time of day. We are just hobbling along at the moment. > > The macro undef could be moved but I preferred to generate an error if > > there was a conflict with the the standard c function prototypes. > > > > We can't use inline code for these functions as they are passed to the > > generic loops as function pointers. > > Yes, I believe this is another problem when declaring function: if we > use say cosl, and cosl is an inline function in the runtime, by > re-declaring it, you are telling the compiler that it is not inline > anymore, so the compiler does not know anymore you can't take the > address of cosl, unless it detects the mismatch between the runtime > declaration and ours, and considers it as an error (I am not sure > whether this is always an error with MS compilers; it may only be a > warning on some versions - it is certainly not dealt in a gracious > manner every time, since the linker crashes in some cases). > Sorry for the late reply, the network was down. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwg at emss.co.za Wed Dec 17 14:11:22 2008 From: gwg at emss.co.za (George Goussard) Date: Wed, 17 Dec 2008 19:11:22 +0000 (UTC) Subject: [Numpy-discussion] Singular Matrix problem with Matplitlib in Numpy (Windows - AMD64) References: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> <5b8d13220812080943g69d4c670jabd6aef66d336e29@mail.gmail.com> <4948D139.4020703@gmail.com> Message-ID: Hi Micheal. I am going on vacation tomorrow. An example will have to wait until I am back, but I can give some version information now: Numpy is version 1.2.1, Matplotlib is version 0.98.5 and I am using stock-standard(not the Enthought version or other distribution) Python 2.5. My Python 2.5 I also compiled with MSVC 2005(AMD/em64t setting) because I can't have dependecies on older crt libraries/dll's. The Intel MKL version is 10.1.0.018 and the AMD ACML version is 4.2.0. I am not using both at the same time. I have first tried the Intel one and now I am using AMD's ACML as explained in previous email. Everything is compiled using the AMD64/em64t "settings". I am not compiling on IA32 and IA64 etc. All the other necessary version information(MSVC), I'll have to check when I am back at the office. I also use other libraries like PyQt(3.17.??) and SIP(version 4.7.7). Both of them are the commercial versions and I also have commercial version of Qt 4.3.3. Needless to say that my backend agg is Qt4Agg. I have tested on Linux 64 bit (SuSe 10.?? commercial version not openSuSe) and it worked. I also tested on Windows XP 32-bit and Linux 32-bit(SuSe 10.??) and everything worked. Constructing and example won't be trivial but I'll try. The reason being thatI embedded Python in an application and then I display graphs using SIP and PyQt. The application was done using Qt4. Anyway, but I'll get back to you on that as soon as I am finished with it. Another interesting aspect is that in my application where I initially construct the array using PyArray_SimpleNew, if I change this to, for example PyArray_SimpleNewFromData then I get a completely different graph which is a solid line(not the effect I described in the previous email) but it is completely the wrong graph, with very small numbers(E-16 numbers). One thing that also bothers me is that on Windows 32-bit. The default was not FORTRAN arrays, but on Windows XP 64 bit the order and everything is FORTRAN default. I ain't even using anything remotely to FORTRAN. I think the AMD ACML is compiled using the Intel FORTRAN compiler, but will that effect it?? Anyway, I'll put an effort into constructing an example, but it will have to be when I am back at the office from my vacation. Cheers. Thanks. George. From gwg at emss.co.za Wed Dec 17 14:28:27 2008 From: gwg at emss.co.za (George Goussard) Date: Wed, 17 Dec 2008 19:28:27 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Singular_Matrix_problem_with_Matplit?= =?utf-8?q?lib_in=09Numpy_=28Windows_-_AMD64=29?= References: <15B34CD0955E484689D667626E6456D5011C8E787E@london.emss.co.za> <5b8d13220812080943g69d4c670jabd6aef66d336e29@mail.gmail.com> Message-ID: Hello David. I am using the Intel MKL BLAS/LAPACK. I have replaced this with AMD's ACML library. Now there is no exception raised due to a "Singular matrix" while trying to move the legend(wiggling the graph). So, the graph is updated and the interaction is fine(you can wiggle the graph and it updates, minimize, maximeie etc.). But ... the legend is now only drawn sometimes and the graphs are drawn with an intermittent line, as if the - - - pattern was specified. Something is still not right. I just can't seem to put my finger on it since there are some many parties involved(numpy,matplotlib,python, ctypes etc.) I also ran the numpy.test() with NUmpy that I compiled with AMD's ACML. The results are included: Running unit tests for numpy NumPy version 1.2.1 Results of numpy.test() NumPy is installed in C:\Development\Python\2_5_2\lib\site-packages\numpy Python version 2.5.2 (r252:60911, Dec 12 2008, 08:38:07) [MSC v.1400 64 bit (AMD64)] nose version 0.10.4 Forcing DISTUTILS_USE_SDK=1 ............................................................................... ............................................................. ............................................................................... ..........................K..K............................... ............................................................................... ............................................................. .......................K....................................................... ..........................Ignoring "MSVCCompiler instance has no attribute '_MSVCCompiler__root'" (I think it is msvccompiler.py bug) ...........................S................................................... ............................................................. ............................................................................... ............................................................. ............................................................................... ............................................................. ............................................................................... ............................................................. ............................................................................... ............................................................. ............................................................................... ............................................................. ............................................................................... ............................................................. ............................................................................... ........ ---------------------------------------------------------------------- Ran 1592 tests in 10.704s OK (KNOWNFAIL=3, SKIP=1) Thanks. George. From irving at naml.us Wed Dec 17 16:52:02 2008 From: irving at naml.us (Geoffrey Irving) Date: Wed, 17 Dec 2008 13:52:02 -0800 Subject: [Numpy-discussion] immutable numpy arrays Message-ID: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> Currently numpy arrays are either writable or unwritable, but unwritable arrays can still be changed through other copies. This means that when a numpy array is passed into an interface that requires immutability for safety reasons, a copy always has to be made. One way around this would be to add a NPY_IMMUTABLE flag signifying that the contents of the array will never change through other copies. This flag would be in addition to the current NPY_WRITEABLE flag, so it would be fully backwards compatible. The flag would be propagated along slices and views. For example, a numpy array created from C code that guarantees immutability would have the flag set. If the array was passed back into a function that required an immutable array, the code could check the immutable flag and skip the copy. This behavior could also be used to implement safe copy-on-write semantics. Making this more generally useful would probably require additional flags to document whether a writeable copy of the array might exists (something like NPY_LEAKED) in order to avoid copies for newly created writeable arrays. Has the issue of immutability been considered before? It seems like a basic NPY_IMMUTABLE flag would be fairly easy to add without backwards compatibility issues, but the secondary features such as NPY_LEAKED would be more complicated. Thanks, Geoffrey From robert.kern at gmail.com Wed Dec 17 17:24:06 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 17 Dec 2008 16:24:06 -0600 Subject: [Numpy-discussion] immutable numpy arrays In-Reply-To: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> References: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> Message-ID: <3d375d730812171424k685b5575icd4b18dafc77c8a0@mail.gmail.com> On Wed, Dec 17, 2008 at 15:52, Geoffrey Irving wrote: > Currently numpy arrays are either writable or unwritable, but > unwritable arrays can still be changed through other copies. This > means that when a numpy array is passed into an interface that > requires immutability for safety reasons, a copy always has to be > made. > > One way around this would be to add a NPY_IMMUTABLE flag signifying > that the contents of the array will never change through other copies. This is not possible to guarantee. With the __array_interface__, I can make a numpy array point at any addressable memory without its knowledge. We can even mutate "immutable" str objects, too. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From irving at naml.us Wed Dec 17 17:51:52 2008 From: irving at naml.us (Geoffrey Irving) Date: Wed, 17 Dec 2008 14:51:52 -0800 Subject: [Numpy-discussion] immutable numpy arrays In-Reply-To: <3d375d730812171424k685b5575icd4b18dafc77c8a0@mail.gmail.com> References: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> <3d375d730812171424k685b5575icd4b18dafc77c8a0@mail.gmail.com> Message-ID: <7f9d599f0812171451g46f34f3i3b1a99ecab38cbc3@mail.gmail.com> On Wed, Dec 17, 2008 at 2:24 PM, Robert Kern wrote: > On Wed, Dec 17, 2008 at 15:52, Geoffrey Irving wrote: >> Currently numpy arrays are either writable or unwritable, but >> unwritable arrays can still be changed through other copies. This >> means that when a numpy array is passed into an interface that >> requires immutability for safety reasons, a copy always has to be >> made. >> >> One way around this would be to add a NPY_IMMUTABLE flag signifying >> that the contents of the array will never change through other copies. > > This is not possible to guarantee. With the __array_interface__, I can > make a numpy array point at any addressable memory without its > knowledge. We can even mutate "immutable" str objects, too. In python __array_interface__ just returns a big integer representing a pointer which can't be used for anything. Well-behaved C code has to be trusted to care when __array_interface__ marks its data as unwriteable, so that shouldn't be a problem either. Is there some other way that arbitrary python code could bypass the NPY_WRITEABLE flag? Geoffrey From shao at msg.ucsf.edu Wed Dec 17 18:26:44 2008 From: shao at msg.ucsf.edu (Lin Shao) Date: Wed, 17 Dec 2008 15:26:44 -0800 Subject: [Numpy-discussion] error compiling umathmodule.c (numpy 1.3) on 64-bit windows xp In-Reply-To: <5b8d13220812161817j2aae55cfleff57bd874589b95@mail.gmail.com> References: <5b8d13220812161817j2aae55cfleff57bd874589b95@mail.gmail.com> Message-ID: > It is a bug in VS, but the problem is caused by buggy code in numpy, > so this can be avoided. Incidentally, I was working on it yesterday, > but went to bed before having fixed everything :) > That's good to know. Thank you for fixing it and let us know when it's ready for test. -lin From robert.kern at gmail.com Wed Dec 17 18:34:59 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 17 Dec 2008 17:34:59 -0600 Subject: [Numpy-discussion] immutable numpy arrays In-Reply-To: <7f9d599f0812171451g46f34f3i3b1a99ecab38cbc3@mail.gmail.com> References: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> <3d375d730812171424k685b5575icd4b18dafc77c8a0@mail.gmail.com> <7f9d599f0812171451g46f34f3i3b1a99ecab38cbc3@mail.gmail.com> Message-ID: <3d375d730812171534y64218eas9638d6b45afde23b@mail.gmail.com> On Wed, Dec 17, 2008 at 16:51, Geoffrey Irving wrote: > On Wed, Dec 17, 2008 at 2:24 PM, Robert Kern wrote: >> On Wed, Dec 17, 2008 at 15:52, Geoffrey Irving wrote: >>> Currently numpy arrays are either writable or unwritable, but >>> unwritable arrays can still be changed through other copies. This >>> means that when a numpy array is passed into an interface that >>> requires immutability for safety reasons, a copy always has to be >>> made. >>> >>> One way around this would be to add a NPY_IMMUTABLE flag signifying >>> that the contents of the array will never change through other copies. >> >> This is not possible to guarantee. With the __array_interface__, I can >> make a numpy array point at any addressable memory without its >> knowledge. We can even mutate "immutable" str objects, too. > > In python __array_interface__ just returns a big integer representing > a pointer which can't be used for anything. I can (and do) *make* an array from Python given an __array_interface__ with that pointer. See numpy/lib/stride_trick.py in numpy 1.2 for an example. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From irving at naml.us Wed Dec 17 18:45:46 2008 From: irving at naml.us (Geoffrey Irving) Date: Wed, 17 Dec 2008 15:45:46 -0800 Subject: [Numpy-discussion] immutable numpy arrays In-Reply-To: <3d375d730812171534y64218eas9638d6b45afde23b@mail.gmail.com> References: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> <3d375d730812171424k685b5575icd4b18dafc77c8a0@mail.gmail.com> <7f9d599f0812171451g46f34f3i3b1a99ecab38cbc3@mail.gmail.com> <3d375d730812171534y64218eas9638d6b45afde23b@mail.gmail.com> Message-ID: <7f9d599f0812171545j1bab5919x1cf7382786646b02@mail.gmail.com> On Wed, Dec 17, 2008 at 3:34 PM, Robert Kern wrote: > On Wed, Dec 17, 2008 at 16:51, Geoffrey Irving wrote: >> On Wed, Dec 17, 2008 at 2:24 PM, Robert Kern wrote: >>> On Wed, Dec 17, 2008 at 15:52, Geoffrey Irving wrote: >>>> Currently numpy arrays are either writable or unwritable, but >>>> unwritable arrays can still be changed through other copies. This >>>> means that when a numpy array is passed into an interface that >>>> requires immutability for safety reasons, a copy always has to be >>>> made. >>>> >>>> One way around this would be to add a NPY_IMMUTABLE flag signifying >>>> that the contents of the array will never change through other copies. >>> >>> This is not possible to guarantee. With the __array_interface__, I can >>> make a numpy array point at any addressable memory without its >>> knowledge. We can even mutate "immutable" str objects, too. >> >> In python __array_interface__ just returns a big integer representing >> a pointer which can't be used for anything. > > I can (and do) *make* an array from Python given an > __array_interface__ with that pointer. See numpy/lib/stride_trick.py > in numpy 1.2 for an example. Ah. Yes, that certainly precludes complete safety. I don't think it precludes the usefulness of an immutable flag though, just like it doesn't preclude the usefulness of the writeable flag. The stride_tricks.py code is already well-behaved: it doesn't turn unwriteable arrays into writeable arrays. It certainly could, but this is analogous to ctypes or untrusted C code. Geoffrey From robert.kern at gmail.com Wed Dec 17 19:28:08 2008 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 17 Dec 2008 18:28:08 -0600 Subject: [Numpy-discussion] immutable numpy arrays In-Reply-To: <7f9d599f0812171545j1bab5919x1cf7382786646b02@mail.gmail.com> References: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> <3d375d730812171424k685b5575icd4b18dafc77c8a0@mail.gmail.com> <7f9d599f0812171451g46f34f3i3b1a99ecab38cbc3@mail.gmail.com> <3d375d730812171534y64218eas9638d6b45afde23b@mail.gmail.com> <7f9d599f0812171545j1bab5919x1cf7382786646b02@mail.gmail.com> Message-ID: <3d375d730812171628o3ad0f711p1a21eed98bfed965@mail.gmail.com> On Wed, Dec 17, 2008 at 17:45, Geoffrey Irving wrote: > On Wed, Dec 17, 2008 at 3:34 PM, Robert Kern wrote: >> On Wed, Dec 17, 2008 at 16:51, Geoffrey Irving wrote: >>> On Wed, Dec 17, 2008 at 2:24 PM, Robert Kern wrote: >>>> On Wed, Dec 17, 2008 at 15:52, Geoffrey Irving wrote: >>>>> Currently numpy arrays are either writable or unwritable, but >>>>> unwritable arrays can still be changed through other copies. This >>>>> means that when a numpy array is passed into an interface that >>>>> requires immutability for safety reasons, a copy always has to be >>>>> made. >>>>> >>>>> One way around this would be to add a NPY_IMMUTABLE flag signifying >>>>> that the contents of the array will never change through other copies. >>>> >>>> This is not possible to guarantee. With the __array_interface__, I can >>>> make a numpy array point at any addressable memory without its >>>> knowledge. We can even mutate "immutable" str objects, too. >>> >>> In python __array_interface__ just returns a big integer representing >>> a pointer which can't be used for anything. >> >> I can (and do) *make* an array from Python given an >> __array_interface__ with that pointer. See numpy/lib/stride_trick.py >> in numpy 1.2 for an example. > > Ah. Yes, that certainly precludes complete safety. > > I don't think it precludes the usefulness of an immutable flag though, > just like it doesn't preclude the usefulness of the writeable flag. > The stride_tricks.py code is already well-behaved: it doesn't turn > unwriteable arrays into writeable arrays. It certainly could, but > this is analogous to ctypes or untrusted C code. It just seems to me to be another complication that does not provide any guarantees. You say "Currently numpy arrays are either writable or unwritable, but unwritable arrays can still be changed through other copies." Adding an immutable flag would just change that to "Currently numpy arrays are either mutable or immutable, but immutable arrays can still be changed through other copies." Basically, the writable flag is intended to indicate your use case. It can be circumvented, but the same methods of circumvention can be applied to any set of flags. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david at ar.media.kyoto-u.ac.jp Wed Dec 17 22:58:07 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 18 Dec 2008 12:58:07 +0900 Subject: [Numpy-discussion] Recent umath changes In-Reply-To: References: <49487916.3010804@ar.media.kyoto-u.ac.jp> <4948949B.9000504@ar.media.kyoto-u.ac.jp> Message-ID: <4949CA4F.10404@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > The declarations were for the SPARC. Originally I had them up in an > ifdef up top, but I got curious what different machines would do. I still don't understand what exact problem they solve. Since the declarations are put when HAVE_FOO is defined, the only problems I can see are problem in the detection code or a platform bug (I seem to remember for SPARC, this was a platform error, right ?). In either case, it should be solved elsewhere (at worst, for platform specific, this should be done within #if PLATFORM/#endif). > They shouldn't cause a problem unless something is pretty strange. They do; the default rule should be not to put any external declaration, because they are heavily toolchain/platform specific. I removed a lot of them from the old code when I refactored this code, and putting them back almost totally alleviate my effort :) To quote python code itself (pyport.h): /************************************************************************** Prototypes that are missing from the standard include files on some systems (and possibly only some versions of such systems.) Please be conservative with adding new ones, document them and enclose them in platform-specific #ifdefs. **************************************************************************/ > The undefs I put where they are for similar reasons, but there was a > strong temptation to move them into the if statement where they used > to be. Could you be more specific ? I want to know the actual error they were solving. > Let's say curiousity got the best of me there. They shouldn't affect > anything but macros and I didn't want the function declarations do be > interpreted as macros. "Shouldn't affect" is not good enough :) The default rule should be to avoid relying at all on those distinctions, and only care when they matter. Doing the other way around does not work, there alway be some strange platform which will break most assumptions, as rationale as they can be. > > Worth knowing ;) It works on the windows buildbot but that is running > python 2.4. Ah, it is 2.4 ! I was wondering the exact combination. It does not work with the platform SDK 6.1 (which includes 64 bits compiler), and this results in a compiler segfault. The problem is particularly pernicious, since the segfaults is not seen directly, but put in a temp file which itself causes problem because two processes try to access it... One of the nicest build failure I have ever seen :) > Speaking of which, the BSD buildbot needs nose (I don't know what > happened to it), the windows box is showing the same old permissions > problem, and one of the SPARC buildbots just times out unless you > build during the right time of day. We are just hobbling along at the > moment. Windows problems at least are not specific to the buildbot. > > Sorry for the late reply, the network was down. No problem, David From dpeterson at enthought.com Thu Dec 18 04:14:44 2008 From: dpeterson at enthought.com (Dave Peterson) Date: Thu, 18 Dec 2008 03:14:44 -0600 Subject: [Numpy-discussion] ANNOUNCE: EPD Py25 v4.1.30101_beta2 available for testing Message-ID: <494A1484.5090705@enthought.com> Hello, The Enthought Python Distribution's (EPD) early access program website is now hosting the beta 2 build of the upcoming EPD Py25 v4.1.301 release. We would very much appreciate your assistance in making EPD as stable and reliable as possible! Please join us in our efforts by downloading an installer for Windows, Mac OS X, or RedHat EL versions 3, 4, and 5 from the following website: http://www.enthought.com/products/epdearlyaccess.php The release notes for the beta2 build are available here: https://svn.enthought.com/epd/wiki/Py25/4.1.301/Beta2 Please provide any comments, concerns, or bug reports via the EPD Trac instance at https://svn.enthought.com/epd or via e-mail to epd-support at enthought.com. -- Dave About EPD --------- The Enthought Python Distribution (EPD) is a "kitchen-sink-included" distribution of the Python? Programming Language, including over 60 additional tools and libraries. The EPD bundle includes NumPy, SciPy, IPython, 2D and 3D visualization, database adapters, and a lot of other tools right out of the box. http://www.enthought.com/products/epd.php It is currently available as a single-click installer for Windows XP (x86), Mac OS X (a universal binary for OS X 10.4 and above), and RedHat 3, 4, and 5 (x86 and amd64). EPD is free for academic use. An annual subscription and installation support are available for individual commercial use. Various workgroup, departmental, and enterprise level subscription options with support and training are also available. Contact us for more information! From animator333 at yahoo.com Thu Dec 18 05:22:23 2008 From: animator333 at yahoo.com (Prashant Saxena) Date: Thu, 18 Dec 2008 15:52:23 +0530 (IST) Subject: [Numpy-discussion] array not appending Message-ID: <390327.28787.qm@web94911.mail.in2.yahoo.com> Hi, This is copied from ipython console. In [42]: import numpy as np In [43]: ST = np.empty([], dtype=np.float32) In [44]: np..append(ST, 10.0) Out[44]: array([ 3.83333603e-38, 1.00000000e+01]) In [45]: np.append(ST, 10.0) Out[45]: array([ 3.83333603e-38, 1.00000000e+01]) In [46]: print ST 3.83333602707e-038 What's wrong here? win XP 32 numpy 1.2.1 python 2.5.2 Prashant Bollywood news, movie reviews, film trailers and more! Go to http://in.movies.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Dec 18 05:33:34 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 18 Dec 2008 11:33:34 +0100 Subject: [Numpy-discussion] array not appending In-Reply-To: <390327.28787.qm@web94911.mail.in2.yahoo.com> References: <390327.28787.qm@web94911.mail.in2.yahoo.com> Message-ID: <20081218103333.GA28061@phare.normalesup.org> On Thu, Dec 18, 2008 at 03:52:23PM +0530, Prashant Saxena wrote: > In [43]: ST = np.empty([], dtype=np.float32) > In [44]: np.append(ST, 10.0) > Out[44]: array([ 3.83333603e-38, 1.00000000e+01]) > In [45]: np.append(ST, 10.0) > Out[45]: array([ 3.83333603e-38, 1.00000000e+01]) > In [46]: print ST > 3.83333602707e-038 > What's wrong here? Nothing. If you look at the documentation, np.append does not modify in place the array.: ''' Returns ------- out : ndarray A copy of `arr` with `values` appended to `axis`. Note that `append` does not occur in-place: a new array is allocated and filled. ''' Modification in place is not possible with the numpy model of an array. Ga?l From animator333 at yahoo.com Thu Dec 18 05:49:20 2008 From: animator333 at yahoo.com (Prashant Saxena) Date: Thu, 18 Dec 2008 16:19:20 +0530 (IST) Subject: [Numpy-discussion] array not appending References: <390327.28787.qm@web94911.mail.in2.yahoo.com> <20081218103333.GA28061@phare.normalesup.org> Message-ID: <836267.80553.qm@web94907.mail.in2.yahoo.com> How do I solve this? Thanks Prashant ________________________________ From: Gael Varoquaux To: Discussion of Numerical Python Sent: Thursday, 18 December, 2008 4:03:34 PM Subject: Re: [Numpy-discussion] array not appending On Thu, Dec 18, 2008 at 03:52:23PM +0530, Prashant Saxena wrote: > In [43]: ST = np.empty([], dtype=np.float32) > In [44]: np.append(ST, 10.0) > Out[44]: array([ 3.83333603e-38, 1.00000000e+01]) > In [45]: np.append(ST, 10.0) > Out[45]: array([ 3.83333603e-38, 1.00000000e+01]) > In [46]: print ST > 3.83333602707e-038 > What's wrong here? Nothing. If you look at the documentation, np.append does not modify in place the array.: ''' Returns ------- out : ndarray A copy of `arr` with `values` appended to `axis`. Note that `append` does not occur in-place: a new array is allocated and filled. ''' Modification in place is not possible with the numpy model of an array. Ga?l _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Dec 18 05:50:49 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 18 Dec 2008 11:50:49 +0100 Subject: [Numpy-discussion] array not appending In-Reply-To: <836267.80553.qm@web94907.mail.in2.yahoo.com> References: <390327.28787.qm@web94911.mail.in2.yahoo.com> <20081218103333.GA28061@phare.normalesup.org> <836267.80553.qm@web94907.mail.in2.yahoo.com> Message-ID: <20081218105049.GB28061@phare.normalesup.org> On Thu, Dec 18, 2008 at 04:19:20PM +0530, Prashant Saxena wrote: > How do I solve this? If you want appending in place you have to use a python list. If you don't need modification in place, np.append returns an array with the appended number. Ga?l From animator333 at yahoo.com Thu Dec 18 05:56:05 2008 From: animator333 at yahoo.com (Prashant Saxena) Date: Thu, 18 Dec 2008 16:26:05 +0530 (IST) Subject: [Numpy-discussion] array not appending References: <390327.28787.qm@web94911.mail.in2.yahoo.com> <20081218103333.GA28061@phare.normalesup.org> <836267.80553.qm@web94907.mail.in2.yahoo.com> <20081218105049.GB28061@phare.normalesup.org> Message-ID: <225225.78739.qm@web94916.mail.in2.yahoo.com> ST = np.empty((), dtype=np.float32) ST = np.append(ST, 10.0) This works, is it proper way to do so? One more prob ST.size returns 2. Why? I have added only one element. Prashant ________________________________ From: Gael Varoquaux To: Discussion of Numerical Python Sent: Thursday, 18 December, 2008 4:20:49 PM Subject: Re: [Numpy-discussion] array not appending On Thu, Dec 18, 2008 at 04:19:20PM +0530, Prashant Saxena wrote: > How do I solve this? If you want appending in place you have to use a python list. If you don't need modification in place, np.append returns an array with the appended number. Ga?l _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy..org/mailman/listinfo/numpy-discussion Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Thu Dec 18 05:44:39 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 18 Dec 2008 19:44:39 +0900 Subject: [Numpy-discussion] array not appending In-Reply-To: <225225.78739.qm@web94916.mail.in2.yahoo.com> References: <390327.28787.qm@web94911.mail.in2.yahoo.com> <20081218103333.GA28061@phare.normalesup.org> <836267.80553.qm@web94907.mail.in2.yahoo.com> <20081218105049.GB28061@phare.normalesup.org> <225225.78739.qm@web94916.mail.in2.yahoo.com> Message-ID: <494A2997.1030504@ar.media.kyoto-u.ac.jp> Prashant Saxena wrote: > > ST = np.empty((), dtype=np.float32) > ST = np.append(ST, 10.0) > > This works, is it proper way to do so? > > One more prob > > ST.size returns 2. > > Why? I have added only one element. You added one element to an array which as already one element. Empty does not mean that the array has no items (which is not possible AFAIK), but that the values are 'empty' (more exactly, they are undefined values, since the memory emplacement has not been initialized). David From haase at msg.ucsf.edu Thu Dec 18 08:00:09 2008 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Thu, 18 Dec 2008 14:00:09 +0100 Subject: [Numpy-discussion] array not appending In-Reply-To: <494A2997.1030504@ar.media.kyoto-u.ac.jp> References: <390327.28787.qm@web94911.mail.in2.yahoo.com> <20081218103333.GA28061@phare.normalesup.org> <836267.80553.qm@web94907.mail.in2.yahoo.com> <20081218105049.GB28061@phare.normalesup.org> <225225.78739.qm@web94916.mail.in2.yahoo.com> <494A2997.1030504@ar.media.kyoto-u.ac.jp> Message-ID: On Thu, Dec 18, 2008 at 11:44 AM, David Cournapeau wrote: > Prashant Saxena wrote: >> >> ST = np.empty((), dtype=np.float32) >> ST = np.append(ST, 10.0) >> >> This works, is it proper way to do so? >> >> One more prob >> >> ST.size returns 2. >> >> Why? I have added only one element. > > You added one element to an array which as already one element. Empty > does not mean that the array has no items (which is not possible AFAIK), > but that the values are 'empty' (more exactly, they are undefined > values, since the memory emplacement has not been initialized). > > David So the question remains: how to create an array of "empty" (i.e. 0) size ? I guess, setting the first argument in empty (i.e. shape) to () produces a scalar value - which is probably what the one the OP saw. I get: >>> np.empty(()).shape () >>> len(np.empty(())) Traceback (most recent call last): File "", line 1, in TypeError: len() of unsized object >>> np.empty((0,)).shape (0) >>> len(np.empty((0,))) 0 Seems all correct. Now, however, to which axis is "append" appending ? Especially if the array doesn't have any axis (i.e. shape = () ). I would argue that append should throw an exception here, rather than "implicitely" changing shape to (1,) before appending -- this is most likely not what the user intended, see OP. Cheers, Sebastian Haase From aisaac at american.edu Thu Dec 18 09:05:03 2008 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 18 Dec 2008 09:05:03 -0500 Subject: [Numpy-discussion] array not appending In-Reply-To: <225225.78739.qm@web94916.mail.in2.yahoo.com> References: <390327.28787.qm@web94911.mail.in2.yahoo.com> <20081218103333.GA28061@phare.normalesup.org> <836267.80553.qm@web94907.mail.in2.yahoo.com> <20081218105049.GB28061@phare.normalesup.org> <225225.78739.qm@web94916.mail.in2.yahoo.com> Message-ID: <494A588F.7080009@american.edu> On 12/18/2008 5:56 AM Prashant Saxena apparently wrote: > ST = np.empty((), dtype=np.float32) > ST = np.append(ST, 10.0) If you really need to append elements, you probably want to use a list and then convert to an array afterwards. But if you know your array size, you can preallocate memory and then fill it. E.g., >>> ST = np.empty((10,),dtype=np.float32) >>> for i in range(10): ST[i]=i hth, Alan Isaac PS Some users suggest it is better practices to use `zeros` rather than `empty`. Note that `zeros` has the same property that surprised you. >>> np.zeros((),dtype=np.float32) array(0.0, dtype=float32) >>> np.append(np.zeros((),dtype=np.float32),99) array([ 0., 99.]) From david at ar.media.kyoto-u.ac.jp Thu Dec 18 08:58:00 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 18 Dec 2008 22:58:00 +0900 Subject: [Numpy-discussion] array not appending In-Reply-To: References: <390327.28787.qm@web94911.mail.in2.yahoo.com> <20081218103333.GA28061@phare.normalesup.org> <836267.80553.qm@web94907.mail.in2.yahoo.com> <20081218105049.GB28061@phare.normalesup.org> <225225.78739.qm@web94916.mail.in2.yahoo.com> <494A2997.1030504@ar.media.kyoto-u.ac.jp> Message-ID: <494A56E8.5090905@ar.media.kyoto-u.ac.jp> Sebastian Haase wrote: > On Thu, Dec 18, 2008 at 11:44 AM, David Cournapeau > wrote: > >> Prashant Saxena wrote: >> >>> ST = np.empty((), dtype=np.float32) >>> ST = np.append(ST, 10.0) >>> >>> This works, is it proper way to do so? >>> >>> One more prob >>> >>> ST.size returns 2. >>> >>> Why? I have added only one element. >>> >> You added one element to an array which as already one element. Empty >> does not mean that the array has no items (which is not possible AFAIK), >> but that the values are 'empty' (more exactly, they are undefined >> values, since the memory emplacement has not been initialized). >> >> David >> > > So the question remains: how to create an array of "empty" (i.e. 0) size ? > The thing is I am not sure it is possible at all - I just wanted to tell the OP that empty does not create an empty array (without any items in it). What would be the need for a 0 item array ? If the point is to append some data without knowing in advance the size, a list is most likely more adapted to the task. An array which cannot be indexed does not sound that useful, but I may just lack some imagination :) cheers, David From lists_ravi at lavabit.com Thu Dec 18 09:46:55 2008 From: lists_ravi at lavabit.com (Ravi) Date: Thu, 18 Dec 2008 09:46:55 -0500 Subject: [Numpy-discussion] array not appending In-Reply-To: References: <390327.28787.qm@web94911.mail.in2.yahoo.com> <494A2997.1030504@ar.media.kyoto-u.ac.jp> Message-ID: <200812180946.56495.lists_ravi@lavabit.com> On Thursday 18 December 2008 08:00:09 Sebastian Haase wrote: > So the question remains: how to create an array of "empty" (i.e. 0) size ? In [1]: from numpy import * In [2]: x = array( [] ) In [3]: x Out[3]: array([], dtype=float64) In [4]: x.size Out[4]: 0 In [5]: x.shape Out[5]: (0,) In [6]: y = zeros( (0,), dtype=int32 ) In [7]: y.shape Out[7]: (0,) Regards, Ravi From irving at naml.us Thu Dec 18 11:01:12 2008 From: irving at naml.us (Geoffrey Irving) Date: Thu, 18 Dec 2008 08:01:12 -0800 Subject: [Numpy-discussion] immutable numpy arrays In-Reply-To: <3d375d730812171628o3ad0f711p1a21eed98bfed965@mail.gmail.com> References: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> <3d375d730812171424k685b5575icd4b18dafc77c8a0@mail.gmail.com> <7f9d599f0812171451g46f34f3i3b1a99ecab38cbc3@mail.gmail.com> <3d375d730812171534y64218eas9638d6b45afde23b@mail.gmail.com> <7f9d599f0812171545j1bab5919x1cf7382786646b02@mail.gmail.com> <3d375d730812171628o3ad0f711p1a21eed98bfed965@mail.gmail.com> Message-ID: <7f9d599f0812180801m5b59e28bvd8bc12909f6998db@mail.gmail.com> On Wed, Dec 17, 2008 at 4:28 PM, Robert Kern wrote: > On Wed, Dec 17, 2008 at 17:45, Geoffrey Irving wrote: >> On Wed, Dec 17, 2008 at 3:34 PM, Robert Kern wrote: >>> On Wed, Dec 17, 2008 at 16:51, Geoffrey Irving wrote: >>>> On Wed, Dec 17, 2008 at 2:24 PM, Robert Kern wrote: >>>>> On Wed, Dec 17, 2008 at 15:52, Geoffrey Irving wrote: >>>>>> Currently numpy arrays are either writable or unwritable, but >>>>>> unwritable arrays can still be changed through other copies. This >>>>>> means that when a numpy array is passed into an interface that >>>>>> requires immutability for safety reasons, a copy always has to be >>>>>> made. >>>>>> >>>>>> One way around this would be to add a NPY_IMMUTABLE flag signifying >>>>>> that the contents of the array will never change through other copies. >>>>> >>>>> This is not possible to guarantee. With the __array_interface__, I can >>>>> make a numpy array point at any addressable memory without its >>>>> knowledge. We can even mutate "immutable" str objects, too. >>>> >>>> In python __array_interface__ just returns a big integer representing >>>> a pointer which can't be used for anything. >>> >>> I can (and do) *make* an array from Python given an >>> __array_interface__ with that pointer. See numpy/lib/stride_trick.py >>> in numpy 1.2 for an example. >> >> Ah. Yes, that certainly precludes complete safety. >> >> I don't think it precludes the usefulness of an immutable flag though, >> just like it doesn't preclude the usefulness of the writeable flag. >> The stride_tricks.py code is already well-behaved: it doesn't turn >> unwriteable arrays into writeable arrays. It certainly could, but >> this is analogous to ctypes or untrusted C code. > > It just seems to me to be another complication that does not provide > any guarantees. You say "Currently numpy arrays are either writable or > unwritable, but unwritable arrays can still be changed through other > copies." Adding an immutable flag would just change that to "Currently > numpy arrays are either mutable or immutable, but immutable arrays can > still be changed through other copies." Basically, the writable flag > is intended to indicate your use case. It can be circumvented, but the > same methods of circumvention can be applied to any set of flags. The point of an immutable array would be that _can't_ be changed through other copies except through broken C code (or the ctypes / __array_interface__ equivalents), so it's not correct to say that it's the same as unwriteable. It's the same distinction as C++ const vs. Java final. Immutability is already a common notion in python, e.g., list vs. tuple and set vs. frozenset, and it's unfortunate that numpy doesn't have an equivalent. However, if you agree that even _with_ the guarantee it's not a useful concept, I'm happy to drop it. As far as the lack of guarantee, here's some python code that modifies 1. Just because I can write it doesn't mean that we should tell people not to trust the values of small integers. :) import sys from numpy import * class Evil: size = int(log(float(sys.maxint+1))/log(2)+1)/8 __array_interface__ = { 'shape' : (1,), 'typestr' : ' References: <49487916.3010804@ar.media.kyoto-u.ac.jp> <4948949B.9000504@ar.media.kyoto-u.ac.jp> <4949CA4F.10404@ar.media.kyoto-u.ac.jp> Message-ID: Hi David, On Wed, Dec 17, 2008 at 8:58 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Charles R Harris wrote: > > > > The declarations were for the SPARC. Originally I had them up in an > > ifdef up top, but I got curious what different machines would do. > > I still don't understand what exact problem they solve. Since the > declarations are put when HAVE_FOO is defined, the only problems I can > see are problem in the detection code or a platform bug (I seem to > remember for SPARC, this was a platform error, right ?). In either case, > it should be solved elsewhere (at worst, for platform specific, this > should be done within #if PLATFORM/#endif). > > > They shouldn't cause a problem unless something is pretty strange. > > They do; the default rule should be not to put any external declaration, > because they are heavily toolchain/platform specific. I removed a lot of > them from the old code when I refactored this code, and putting them > back almost totally alleviate my effort :) To quote python code itself > (pyport.h): > > /************************************************************************** > Prototypes that are missing from the standard include files on some systems > (and possibly only some versions of such systems.) > > Please be conservative with adding new ones, document them and enclose them > in platform-specific #ifdefs. > **************************************************************************/ > That's how I did it originally, that's why that section is up top. But I got curious. So that can be fixed. > > > The undefs I put where they are for similar reasons, but there was a > > strong temptation to move them into the if statement where they used > > to be. > > Could you be more specific ? I want to know the actual error they were > solving. > The undefs need to be there when the functions are defined by numpy, so they only need to be in the same #if block as those definitions. I moved them out to cover the function declarations also, but if those are put in their own block for SPARC then they aren't needed. > > > Let's say curiousity got the best of me there. They shouldn't affect > > anything but macros and I didn't want the function declarations do be > > interpreted as macros. > > "Shouldn't affect" is not good enough :) The default rule should be to > avoid relying at all on those distinctions, and only care when they > matter. Doing the other way around does not work, there alway be some > strange platform which will break most assumptions, as rationale as they > can be. > > > > > Worth knowing ;) It works on the windows buildbot but that is running > > python 2.4. > > Ah, it is 2.4 ! I was wondering the exact combination. It does not work > with the platform SDK 6.1 (which includes 64 bits compiler), and this > results in a compiler segfault. The problem is particularly pernicious, > since the segfaults is not seen directly, but put in a temp file which > itself causes problem because two processes try to access it... One of > the nicest build failure I have ever seen :) > The window buildbot was working, went off line for a few weeks, and showed failures on return. It is a VMWare version, so maybe something was changed in between. > > > Speaking of which, the BSD buildbot needs nose (I don't know what > > happened to it), the windows box is showing the same old permissions > > problem, and one of the SPARC buildbots just times out unless you > > build during the right time of day. We are just hobbling along at the > > moment. > > Windows problems at least are not specific to the buildbot. > > > > > Sorry for the late reply, the network was down. > > No problem, > And I still have network problems... What will the world do if the networks collapse? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Dec 18 15:50:00 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 18 Dec 2008 14:50:00 -0600 Subject: [Numpy-discussion] array not appending In-Reply-To: <494A56E8.5090905@ar.media.kyoto-u.ac.jp> References: <390327.28787.qm@web94911.mail.in2.yahoo.com> <20081218103333.GA28061@phare.normalesup.org> <836267.80553.qm@web94907.mail.in2.yahoo.com> <20081218105049.GB28061@phare.normalesup.org> <225225.78739.qm@web94916.mail.in2.yahoo.com> <494A2997.1030504@ar.media.kyoto-u.ac.jp> <494A56E8.5090905@ar.media.kyoto-u.ac.jp> Message-ID: <3d375d730812181250t6a50128bsb27be03b28bc724b@mail.gmail.com> On Thu, Dec 18, 2008 at 07:58, David Cournapeau wrote: > What would be the need for a 0 item array ? If the point is to append > some data without knowing in advance the size, a list is most likely > more adapted to the task. An array which cannot be indexed does not > sound that useful, but I may just lack some imagination :) It's an edge case (literally) that can pop up when you are doing slicing on larger arrays. 0-item arrays do fit in numpy's memory model, so specifically disallowing them would mean gratuitously requiring try: excepts: in code that would otherwise be generic. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Thu Dec 18 16:00:35 2008 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 18 Dec 2008 15:00:35 -0600 Subject: [Numpy-discussion] immutable numpy arrays In-Reply-To: <7f9d599f0812180801m5b59e28bvd8bc12909f6998db@mail.gmail.com> References: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> <3d375d730812171424k685b5575icd4b18dafc77c8a0@mail.gmail.com> <7f9d599f0812171451g46f34f3i3b1a99ecab38cbc3@mail.gmail.com> <3d375d730812171534y64218eas9638d6b45afde23b@mail.gmail.com> <7f9d599f0812171545j1bab5919x1cf7382786646b02@mail.gmail.com> <3d375d730812171628o3ad0f711p1a21eed98bfed965@mail.gmail.com> <7f9d599f0812180801m5b59e28bvd8bc12909f6998db@mail.gmail.com> Message-ID: <3d375d730812181300t3c84c440v4000bdc9bb32b8b4@mail.gmail.com> On Thu, Dec 18, 2008 at 10:01, Geoffrey Irving wrote: > On Wed, Dec 17, 2008 at 4:28 PM, Robert Kern wrote: >> It just seems to me to be another complication that does not provide >> any guarantees. You say "Currently numpy arrays are either writable or >> unwritable, but unwritable arrays can still be changed through other >> copies." Adding an immutable flag would just change that to "Currently >> numpy arrays are either mutable or immutable, but immutable arrays can >> still be changed through other copies." Basically, the writable flag >> is intended to indicate your use case. It can be circumvented, but the >> same methods of circumvention can be applied to any set of flags. > > The point of an immutable array would be that _can't_ be changed > through other copies except through broken C code (or the ctypes / > __array_interface__ equivalents), so it's not correct to say that it's > the same as unwriteable. It's the same distinction as C++ const vs. > Java final. Immutability is already a common notion in python, e.g., > list vs. tuple and set vs. frozenset, and it's unfortunate that numpy > doesn't have an equivalent. > > However, if you agree that even _with_ the guarantee it's not a useful > concept, I'm happy to drop it. What I'm trying to suggest is that most code already treats the writeable flag like I think you want the immutable flag to be treated. I'm not sure what you think is missing. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From bradford.n.cross at gmail.com Thu Dec 18 21:27:12 2008 From: bradford.n.cross at gmail.com (Bradford Cross) Date: Thu, 18 Dec 2008 18:27:12 -0800 Subject: [Numpy-discussion] new incremental statistics project Message-ID: This is a new project I just released. I know it is C#, but some of the design and idioms would be nice in numpy/scipy for working with discrete event simulators, time series, and event stream processing. http://code.google.com/p/incremental-statistics/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Dec 18 23:12:38 2008 From: cournape at gmail.com (David Cournapeau) Date: Fri, 19 Dec 2008 13:12:38 +0900 Subject: [Numpy-discussion] Recent umath changes In-Reply-To: References: <49487916.3010804@ar.media.kyoto-u.ac.jp> <4948949B.9000504@ar.media.kyoto-u.ac.jp> <4949CA4F.10404@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220812182012r7c86a188r120bdfd5ae15d447@mail.gmail.com> Hi Chuck, On Fri, Dec 19, 2008 at 2:15 AM, Charles R Harris wrote: > The undefs need to be there when the functions are defined by numpy, so they > only need to be in the same #if block as those definitions. I moved them out > to cover the function declarations also, but if those are put in their own > block for SPARC then they aren't needed. But then it just hides the problem instead of solving it. If we are in the #if bloc and the symbol is defined, it is a bug in the configuration stage, it should be dealt there - if it is a bug in the toolchain (say the symbol is in the library, but not declared in the header), then it should be dealt with for that exact platform only. It is not nit-picking, because the later way means it won't break any other platform :) It still should be used sparingly, though (the SPARC problem is a good example where it should be used). cheers, David From stefan at sun.ac.za Fri Dec 19 08:37:03 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 19 Dec 2008 15:37:03 +0200 Subject: [Numpy-discussion] new incremental statistics project In-Reply-To: References: Message-ID: <9457e7c80812190537p9566018h21be08c58e959fa@mail.gmail.com> Hi Bradford 2008/12/19 Bradford Cross : > This is a new project I just released. > > I know it is C#, but some of the design and idioms would be nice in > numpy/scipy for working with discrete event simulators, time series, and > event stream processing. Could you please send a slightly longer description of the idioms you describe, and how they would fit into scipy.stats, scikits.timeseries, etc.? Would you be interested in working on these enhancements? Thanks St?fan From jdh2358 at gmail.com Fri Dec 19 08:53:31 2008 From: jdh2358 at gmail.com (John Hunter) Date: Fri, 19 Dec 2008 07:53:31 -0600 Subject: [Numpy-discussion] new incremental statistics project In-Reply-To: References: Message-ID: <88e473830812190553lb6bf5e1s3eafcbe80d32f3cb@mail.gmail.com> On Thu, Dec 18, 2008 at 8:27 PM, Bradford Cross wrote: > This is a new project I just released. > > I know it is C#, but some of the design and idioms would be nice in > numpy/scipy for working with discrete event simulators, time series, and > event stream processing. > > http://code.google.com/p/incremental-statistics/ I think an incremental stats module would be a boon to numpy or scipy. Eric Firing has a nice module wrtten in C with a pyrex wrapper (ringbuf) that does trailing incremental mean, median, std, min, max, and percentile. It maintains a sorted queue to do the last three efficiently, and handles NaN inputs. I would like to see this extended to include exponential or other weightings to do things like incremental trailing exponential moving averages and variances. I don't know what the licensing terms are of this module, but it might be a good starting point for an incremental numpy stats module, at least if you were thinking about supporting a finite lookback window. We have a copy of this in the py4science examples dir if you want to take a look: svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/pyrex/trailstats cd trailstats/ make python movavg_ringbuf.py Other things that would be very useful are incremental covariance and regression. JDH From ndbecker2 at gmail.com Fri Dec 19 09:19:35 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 19 Dec 2008 09:19:35 -0500 Subject: [Numpy-discussion] new incremental statistics project References: <88e473830812190553lb6bf5e1s3eafcbe80d32f3cb@mail.gmail.com> Message-ID: On a somewhat related note, I am looking for recursive calculation of variance for complex. For complex I want var as defined by E[|x^2|]. Is there an incremental (recursive) implementation in the complex case? From faltet at pytables.org Fri Dec 19 09:31:36 2008 From: faltet at pytables.org (Francesc Alted) Date: Fri, 19 Dec 2008 15:31:36 +0100 Subject: [Numpy-discussion] ANN: PyTables 2.1 (final) released Message-ID: <200812191531.36850.faltet@pytables.org> =========================== Announcing PyTables 2.1 =========================== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables 2.1 introduces important improvements, like much faster node opening, creation or navigation, a file-based way to fine-tune the different PyTables parameters (fully documented now in a new appendix of the manual) and support for multidimensional atoms in EArray/CArray objects. Regarding the Pro edition, four different kinds of indexes are supported so that the user can choose the best for her needs. Also, and due to the introduction of the concept of chunkmaps in OPSI, the responsiveness of complex queries with low selectivity has improved quite a lot. And last but not least, it is possible now to sort tables by a specific field with no practical limit in size (tables up to 2**48 rows). Also, a lot of work has gone in the reworking of the "Optimization tips" chapter of the manual where many benchmarks have been redone using newer software and machines and a few new sections have been added. In particular, see the new "Fine-tuning the chunksize" section where you will find an in-deep introduction to the subject of chunking and the "Indexing and Solid State Disks (SSD)" where the advantages of using low-latency SSD disks have been analysed in the context of indexation. In case you want to know more in detail what has changed in this version, have a look at ``RELEASE_NOTES.txt`` in the tarball. Find the HTML version for this document at: http://www.pytables.org/moin/ReleaseNotes/Release_2.1 You can download a source package of the version 2.1 with generated PDF and HTML docs and binaries for Windows from http://www.pytables.org/download/stable For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.1 Finally, you can get an evaluation version for PyTables Pro in: http://www.pytables.org/download/evaluation Resources ========= Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last, but not least thanks a lot to the HDF5 and NumPy (and numarray!) makers. Without them PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team From sturla at molden.no Fri Dec 19 11:10:51 2008 From: sturla at molden.no (Sturla Molden) Date: Fri, 19 Dec 2008 17:10:51 +0100 Subject: [Numpy-discussion] lfilter Message-ID: <494BC78B.5010405@molden.no> I am wondering if not scipy.signal.lfilter ought to be a part of the core NumPy. Note that it is similar to the filter function found in Matlab, and it makes a complement to numpy.convolve. May I suggest that it is renamed or aliased to numpy.filter? Sturla Molden From charlesr.harris at gmail.com Fri Dec 19 12:10:37 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Dec 2008 10:10:37 -0700 Subject: [Numpy-discussion] Recent umath changes In-Reply-To: <5b8d13220812182012r7c86a188r120bdfd5ae15d447@mail.gmail.com> References: <49487916.3010804@ar.media.kyoto-u.ac.jp> <4948949B.9000504@ar.media.kyoto-u.ac.jp> <4949CA4F.10404@ar.media.kyoto-u.ac.jp> <5b8d13220812182012r7c86a188r120bdfd5ae15d447@mail.gmail.com> Message-ID: On Thu, Dec 18, 2008 at 9:12 PM, David Cournapeau wrote: > Hi Chuck, > > On Fri, Dec 19, 2008 at 2:15 AM, Charles R Harris > wrote: > > > The undefs need to be there when the functions are defined by numpy, so > they > > only need to be in the same #if block as those definitions. I moved them > out > > to cover the function declarations also, but if those are put in their > own > > block for SPARC then they aren't needed. > > But then it just hides the problem instead of solving it. If we are in > the #if bloc and the symbol is defined, it is a bug in the > configuration stage, it should be dealt there - if it is a bug in the > toolchain (say the symbol is in the library, but not declared in the > header), then it should be dealt with for that exact platform only. > > It is not nit-picking, because the later way means it won't break any > other platform :) It still should be used sparingly, though (the SPARC > problem is a good example where it should be used). > > cheers, > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Dec 19 12:15:49 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Dec 2008 10:15:49 -0700 Subject: [Numpy-discussion] Recent umath changes In-Reply-To: <5b8d13220812182012r7c86a188r120bdfd5ae15d447@mail.gmail.com> References: <49487916.3010804@ar.media.kyoto-u.ac.jp> <4948949B.9000504@ar.media.kyoto-u.ac.jp> <4949CA4F.10404@ar.media.kyoto-u.ac.jp> <5b8d13220812182012r7c86a188r120bdfd5ae15d447@mail.gmail.com> Message-ID: On Thu, Dec 18, 2008 at 9:12 PM, David Cournapeau wrote: > Hi Chuck, > > On Fri, Dec 19, 2008 at 2:15 AM, Charles R Harris > wrote: > > > The undefs need to be there when the functions are defined by numpy, so > they > > only need to be in the same #if block as those definitions. I moved them > out > > to cover the function declarations also, but if those are put in their > own > > block for SPARC then they aren't needed. > > But then it just hides the problem instead of solving it. If we are in > the #if bloc and the symbol is defined, it is a bug in the > configuration stage, it should be dealt there - if it is a bug in the > toolchain (say the symbol is in the library, but not declared in the > header), then it should be dealt with for that exact platform only. > > It is not nit-picking, because the later way means it won't break any > other platform :) It still should be used sparingly, though (the SPARC > problem is a good example where it should be used). > True, it should be solved in the configuration stage, but what if it isn't? I suppose an error message might be the desired result. If you want to remove the undefs to see what happens, that's fine with me. They were inherited from the old code in any case. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Fri Dec 19 13:59:55 2008 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 19 Dec 2008 08:59:55 -1000 Subject: [Numpy-discussion] new incremental statistics project In-Reply-To: <88e473830812190553lb6bf5e1s3eafcbe80d32f3cb@mail.gmail.com> References: <88e473830812190553lb6bf5e1s3eafcbe80d32f3cb@mail.gmail.com> Message-ID: <494BEF2B.6040801@hawaii.edu> John Hunter wrote: > On Thu, Dec 18, 2008 at 8:27 PM, Bradford Cross > wrote: >> This is a new project I just released. >> >> I know it is C#, but some of the design and idioms would be nice in >> numpy/scipy for working with discrete event simulators, time series, and >> event stream processing. >> >> http://code.google.com/p/incremental-statistics/ > > I think an incremental stats module would be a boon to numpy or scipy. > Eric Firing has a nice module wrtten in C with a pyrex wrapper > (ringbuf) that does trailing incremental mean, median, std, min, max, > and percentile. It maintains a sorted queue to do the last three > efficiently, and handles NaN inputs. I would like to see this > extended to include exponential or other weightings to do things like > incremental trailing exponential moving averages and variances. I > don't know what the licensing terms are of this module, but it might Licensing is no problem; I have never bothered with it, but I can tack on a BSD-type license if that would help. Eric > be a good starting point for an incremental numpy stats module, at > least if you were thinking about supporting a finite lookback window. > We have a copy of this in the py4science examples dir if you want to > take a look: > > svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/pyrex/trailstats > cd trailstats/ > make > python movavg_ringbuf.py > > Other things that would be very useful are incremental covariance and > regression. > > JDH From jdh2358 at gmail.com Fri Dec 19 14:32:44 2008 From: jdh2358 at gmail.com (John Hunter) Date: Fri, 19 Dec 2008 13:32:44 -0600 Subject: [Numpy-discussion] new incremental statistics project In-Reply-To: <494BEF2B.6040801@hawaii.edu> References: <88e473830812190553lb6bf5e1s3eafcbe80d32f3cb@mail.gmail.com> <494BEF2B.6040801@hawaii.edu> Message-ID: <88e473830812191132y1f337136yc764ef7f53300c5c@mail.gmail.com> On Fri, Dec 19, 2008 at 12:59 PM, Eric Firing wrote: > Licensing is no problem; I have never bothered with it, but I can tack on a > BSD-type license if that would help. Great -- if you are the copyright holder, would you commit a BSD license file to the py4science trailstats dir? I just committed the small bug fix we discussed yesterday there. Thanks! JDH From irving at naml.us Fri Dec 19 14:50:15 2008 From: irving at naml.us (Geoffrey Irving) Date: Fri, 19 Dec 2008 11:50:15 -0800 Subject: [Numpy-discussion] immutable numpy arrays In-Reply-To: <3d375d730812181300t3c84c440v4000bdc9bb32b8b4@mail.gmail.com> References: <7f9d599f0812171352ocecf8bcof4e1414d7a9f152f@mail.gmail.com> <3d375d730812171424k685b5575icd4b18dafc77c8a0@mail.gmail.com> <7f9d599f0812171451g46f34f3i3b1a99ecab38cbc3@mail.gmail.com> <3d375d730812171534y64218eas9638d6b45afde23b@mail.gmail.com> <7f9d599f0812171545j1bab5919x1cf7382786646b02@mail.gmail.com> <3d375d730812171628o3ad0f711p1a21eed98bfed965@mail.gmail.com> <7f9d599f0812180801m5b59e28bvd8bc12909f6998db@mail.gmail.com> <3d375d730812181300t3c84c440v4000bdc9bb32b8b4@mail.gmail.com> Message-ID: <7f9d599f0812191150g28fa4059i883ec61109eb5284@mail.gmail.com> On Thu, Dec 18, 2008 at 1:00 PM, Robert Kern wrote: > On Thu, Dec 18, 2008 at 10:01, Geoffrey Irving wrote: >> On Wed, Dec 17, 2008 at 4:28 PM, Robert Kern wrote: > >>> It just seems to me to be another complication that does not provide >>> any guarantees. You say "Currently numpy arrays are either writable or >>> unwritable, but unwritable arrays can still be changed through other >>> copies." Adding an immutable flag would just change that to "Currently >>> numpy arrays are either mutable or immutable, but immutable arrays can >>> still be changed through other copies." Basically, the writable flag >>> is intended to indicate your use case. It can be circumvented, but the >>> same methods of circumvention can be applied to any set of flags. >> >> The point of an immutable array would be that _can't_ be changed >> through other copies except through broken C code (or the ctypes / >> __array_interface__ equivalents), so it's not correct to say that it's >> the same as unwriteable. It's the same distinction as C++ const vs. >> Java final. Immutability is already a common notion in python, e.g., >> list vs. tuple and set vs. frozenset, and it's unfortunate that numpy >> doesn't have an equivalent. >> >> However, if you agree that even _with_ the guarantee it's not a useful >> concept, I'm happy to drop it. > > What I'm trying to suggest is that most code already treats the > writeable flag like I think you want the immutable flag to be treated. > I'm not sure what you think is missing. After further consideration, I'll withdraw the immutability flag request. I think most of what looking for can be implemented with inheritance, though not in a completely satisfactory manner. Here are details: My main use case is interacting with a system that deals with immutable arrays without having to introduce unnecessary copying. The system makes heavy use of dependency analysis internally to cache/save computation, and may segfault if an array it thinks is immutable changes (e.g. if the array describes the topology of a mesh). It should be impossible for normal python scripting to cause such a segfault. Say I have a function "get_array" which returns an array from this system which is guaranteed immutable, a function "set_array" which stores an array. It is safe to skip the copy if I do something like set_array(get_array()) However, set_array can't distinguish this from a = get_array().copy() b = a[:] a.flags.writeable = 0 set_array(a) b[0] = 3 The difference between writable and immutable is that it would be invalid to set the writable flag to False after creation, since the array may have already leaked. However, this is rather convoluted code, but it's the only example I can come up with that would be fixed with just an immutability flag. Therefore, the immutability flag is a bad idea. A more interesting and likely example is set_array(2 * get_array()) In this case, set_array() will receive an unwriteable array with reference count 1 (it owns the only reference). However, that is indistinguishable from a = 2 * get_array() set_array(a[:]) a[0] = 3 One way to solve this is to make a derived array class which is always immutable and propagates immutability and unwritability during arithmetic. This would safely avoid the overhead in all examples above, and is straightforward to implement. Unfortunately, it adds unnecessary copying in legitimate code that wants to modify results: a = 2 * get_array() a[0] = 2 # exception! set_array(a) Get rid of all unnecessary copies in that code would require tracking leaks and allowing set_array to either freeze "a" or change it to copy-on-write. That might end up too complicated or magical to be practical, though. In particular, it couldn't be implemented in a completely safe manner using inheritance. In any case, I think the benefit would be tiny enough that I should drop it and stick to copies unless someone else expresses interest. Thanks, Geoffrey From ondrej at certik.cz Fri Dec 19 17:30:05 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Fri, 19 Dec 2008 23:30:05 +0100 Subject: [Numpy-discussion] missing doc dir in the official tarball Message-ID: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> Hi, while packaging the new version of numpy, I realized that it is missing a documentation. I just checked with Stefan on Jabber and he thinks it should be rather a trivial fix. Do you Jarrod think you could please release a new tarball with the doc directory? The problem is that debian (and I guess other distros as well) has one source package (e.g. numpy tarball + debian files) and it creates python-numpy, python-numpy-dbg and python-numpy-doc binary packages from it. There should definitely be a doc package. So if the tarball is missing documentation, we need to repackage it. Since the doc is only in svn (right?), we would have to write some scripts to first svn checkout the doc, unpack the official tarball, include the doc, pack it and that would be our tarball. So we thought with Stefan that maybe a simpler solution is just to fix the ./setup sdist (or how you create the tarball in numpy) to include documentation and be done with it. What do you think? If you are busy, I can look at it how to fix the numpy tarball creation. Thanks, Ondrej From ondrej at certik.cz Fri Dec 19 17:35:07 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Fri, 19 Dec 2008 23:35:07 +0100 Subject: [Numpy-discussion] Missing numpy.i In-Reply-To: <001901c9496a$58b83cc0$e7ad810a@gnb.st.com> References: <9457e7c80810160620i2aeec4e3o4df1ae82906a1490@mail.gmail.com> <001901c9496a$58b83cc0$e7ad810a@gnb.st.com> Message-ID: <85b5c3130812191435h2be092d7of39f02cf9c0ed30b@mail.gmail.com> On Tue, Nov 18, 2008 at 11:42 AM, Nicolas ROUX wrote: > Hi, > > About the missing doc directory in the windows install in latest numpy > release, could you please add it ? > (please see below the previous thread) Well, this is a serious problem, so it should definitely be fixed, see here: http://projects.scipy.org/pipermail/numpy-discussion/2008-December/039309.html Ondrej From stefan at sun.ac.za Fri Dec 19 17:43:37 2008 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 20 Dec 2008 00:43:37 +0200 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> Message-ID: <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> 2008/12/20 Ondrej Certik : > So we thought with Stefan that maybe a simpler solution is just to fix > the ./setup sdist (or how you create the tarball in numpy) to include > documentation and be done with it. I think releases should either include the Sphinx documentation or, alternatively, we should provide a separate tar-ball for the docs along with every release. St?fan From cournape at gmail.com Fri Dec 19 21:35:30 2008 From: cournape at gmail.com (David Cournapeau) Date: Sat, 20 Dec 2008 11:35:30 +0900 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> Message-ID: <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> On Sat, Dec 20, 2008 at 7:43 AM, St?fan van der Walt wrote: > 2008/12/20 Ondrej Certik : >> So we thought with Stefan that maybe a simpler solution is just to fix >> the ./setup sdist (or how you create the tarball in numpy) to include >> documentation and be done with it. > > I think releases should either include the Sphinx documentation or, > alternatively, we should provide a separate tar-ball for the docs > along with every release. How difficult would it be to generate the doc ? I have not followed in detail what happened recently on that front. Is sphinx + sphinx ext the only necessary additional tools ? What I did for audiolab recently was to use paver to generate the source distribution for releases: the release sdist generates the usual sdist (which is kept the same as before - e.g. no need for tools to build the doc), as well as the doc (html + pdf), and put everything together. Paver is a mere convenience, and this could be done with simple scripts, of course, David From ondrej at certik.cz Sat Dec 20 05:43:19 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Sat, 20 Dec 2008 11:43:19 +0100 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> Message-ID: <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> On Sat, Dec 20, 2008 at 3:35 AM, David Cournapeau wrote: > On Sat, Dec 20, 2008 at 7:43 AM, St?fan van der Walt wrote: >> 2008/12/20 Ondrej Certik : >>> So we thought with Stefan that maybe a simpler solution is just to fix >>> the ./setup sdist (or how you create the tarball in numpy) to include >>> documentation and be done with it. >> >> I think releases should either include the Sphinx documentation or, >> alternatively, we should provide a separate tar-ball for the docs >> along with every release. > > How difficult would it be to generate the doc ? I have not followed in > detail what happened recently on that front. Is sphinx + sphinx ext > the only necessary additional tools ? Just to make it clear -- I think the docs should not be generated in the tarball -- only the sources should be there. The html (and/or pdf) docs will be generated at the package build. Ondrej From cournape at gmail.com Sat Dec 20 06:15:43 2008 From: cournape at gmail.com (David Cournapeau) Date: Sat, 20 Dec 2008 20:15:43 +0900 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> Message-ID: <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> On Sat, Dec 20, 2008 at 7:43 PM, Ondrej Certik wrote: > > Just to make it clear -- I think the docs should not be generated in > the tarball -- only the sources should be there. I agree this makes more sense for you, as a packager, but I am not sure it makes much sense to put the doc sources in the tarball for users (Building numpy should only require python + a C compiler; building the doc is more difficult -you need at least sphinx and all its dependencies). For audiolab, I put the generated doc, thinking if people want to mess with the doc, they are knowledgeable enough to deal with svn - but I did not think about the packagers :) I am not sure what's the best solution: maybe put both in the (released) source tarball ? David From gael.varoquaux at normalesup.org Sat Dec 20 06:26:18 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 20 Dec 2008 12:26:18 +0100 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> Message-ID: <20081220112618.GA23638@phare.normalesup.org> On Sat, Dec 20, 2008 at 08:15:43PM +0900, David Cournapeau wrote: > For audiolab, I put the generated doc, thinking if people want to mess > with the doc, they are knowledgeable enough to deal with svn - but I > did not think about the packagers :) I am not sure what's the best > solution: maybe put both in the (released) source tarball ? For Mayavi/ETS we put both. Docs are very important, and we feared people having difficulties building them, as the doc build tools and build chain isn't as mature as the rest of the build chain. Of course, for debian packaging the problem is different, but that's only a fraction of our users. Ga?l From david at ar.media.kyoto-u.ac.jp Sat Dec 20 07:01:25 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 20 Dec 2008 21:01:25 +0900 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: <20081220112618.GA23638@phare.normalesup.org> References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> <20081220112618.GA23638@phare.normalesup.org> Message-ID: <494CDE95.40108@ar.media.kyoto-u.ac.jp> Gael Varoquaux wrote: > > For Mayavi/ETS we put both. Docs are very important, and we feared people > having difficulties building them, as the doc build tools and build chain > isn't as mature as the rest of the build chain. > Yes, I don't think anyone is arguing for users to build the doc. When distributing binaries, we should ideally put the built docs along with the installer itself (it is easy on mac os x with .dmg; we could for example simply put the installer itself together with the doc in a zip file for windows - anyone with XP and above can read zip out of the box). > Of course, for debian packaging the problem is different, but that's only > a fraction of our users. > I built the doc, and it looks like putting the sources of the doc + html + pdf will more or less multiply the size of the tarball by 4 (from ~ 1.5 M to 6 M). Maybe once we have a system like python.org such as for each release we have the doc at the corresponding version, we could just skip shipping the built doc in the source tarball, and only provide it with the binaries. I think it is important to have the built docs at least for the binaries installers - but it does not seem as important for the sources, assuming the user can find it quickly on the website of course. cheers, David From pav at iki.fi Sat Dec 20 09:02:21 2008 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 20 Dec 2008 14:02:21 +0000 (UTC) Subject: [Numpy-discussion] missing doc dir in the official tarball References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> Message-ID: Sat, 20 Dec 2008 20:15:43 +0900, David Cournapeau wrote: > On Sat, Dec 20, 2008 at 7:43 PM, Ondrej Certik wrote: >> Just to make it clear -- I think the docs should not be generated in >> the tarball -- only the sources should be there. > > I agree this makes more sense for you, as a packager, but I am not sure > it makes much sense to put the doc sources in the tarball for users > (Building numpy should only require python + a C compiler; building the > doc is more difficult -you need at least sphinx and all its > dependencies). > > For audiolab, I put the generated doc, thinking if people want to mess > with the doc, they are knowledgeable enough to deal with svn - but I did > not think about the packagers :) I am not sure what's the best solution: > maybe put both in the (released) source tarball ? I'd say that we put the source for the documentation to the documentation tarball, and distribute the built HTML+whatever documentation in a separate package. -- Pauli Virtanen From charlesr.harris at gmail.com Sat Dec 20 14:05:51 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 20 Dec 2008 12:05:51 -0700 Subject: [Numpy-discussion] new incremental statistics project In-Reply-To: <88e473830812190553lb6bf5e1s3eafcbe80d32f3cb@mail.gmail.com> References: <88e473830812190553lb6bf5e1s3eafcbe80d32f3cb@mail.gmail.com> Message-ID: On Fri, Dec 19, 2008 at 6:53 AM, John Hunter wrote: > On Thu, Dec 18, 2008 at 8:27 PM, Bradford Cross > wrote: > > This is a new project I just released. > > > > I know it is C#, but some of the design and idioms would be nice in > > numpy/scipy for working with discrete event simulators, time series, and > > event stream processing. > > > > http://code.google.com/p/incremental-statistics/ > > I think an incremental stats module would be a boon to numpy or scipy. > Eric Firing has a nice module wrtten in C with a pyrex wrapper > (ringbuf) that does trailing incremental mean, median, std, min, max, > and percentile. It maintains a sorted queue to do the last three > efficiently, and handles NaN inputs. I would like to see this > extended to include exponential or other weightings to do things like > incremental trailing exponential moving averages and variances. I > don't know what the licensing terms are of this module, but it might > be a good starting point for an incremental numpy stats module, at > least if you were thinking about supporting a finite lookback window. > We have a copy of this in the py4science examples dir if you want to > take a look: > > svn co > https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/pyrex/trailstats > cd trailstats/ > make > python movavg_ringbuf.py > > Other things that would be very useful are incremental covariance and > regression. > Some sort of Kalman filter? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Joris.DeRidder at ster.kuleuven.be Sat Dec 20 14:48:49 2008 From: Joris.DeRidder at ster.kuleuven.be (Joris De Ridder) Date: Sat, 20 Dec 2008 20:48:49 +0100 Subject: [Numpy-discussion] lfilter In-Reply-To: <494BC78B.5010405@molden.no> References: <494BC78B.5010405@molden.no> Message-ID: On 19 Dec 2008, at 17:10 , Sturla Molden wrote: > I am wondering if not scipy.signal.lfilter ought to be a part of the > core NumPy. Note that it is similar to the filter function found in > Matlab, and it makes a complement to numpy.convolve. > > May I suggest that it is renamed or aliased to numpy.filter? NumPy is primarily meant as an N-dimensional array manipulation library, so an IIR/FIR filter doesn't really fit into this. The developers are not aiming at mimicking Matlab, but are offering NumPy/ SciPy as a viable open-source alternative, where SciPy is the package to be used for mathematics, science, and engineering. Cheers, Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From david at ar.media.kyoto-u.ac.jp Sun Dec 21 03:13:11 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 21 Dec 2008 17:13:11 +0900 Subject: [Numpy-discussion] numpy on windows x64 with mingw: it (almost) works Message-ID: <494DFA97.90508@ar.media.kyoto-u.ac.jp> Hi, Just a few words to mention that I've finally managed to build numpy with the mingw-w64 project (port of mingw to AMD 64 bits MS OS), and it almost run OK. By almost, I mean that numpy.test() finishes without crash, assuming a few unit tests are skipped (some long double problems). Not all unit tests pass, but almost all of them are easy to fix problems in numpy (except for the long double problem). The drawback is that you can't do that just by using the mingw-w64 binaries, you have to build your own toolchain because of some bugs/missing features in mingw-w64. I've put the gory details there: http://scipy.org/scipy/numpy/wiki/MicrosoftToolchainSupport Hopefully, this should make it easier to add fortran support with gfortran, opening the possibility to have both numpy and scipy buildable on windows x64 with free compilers, David From gael.varoquaux at normalesup.org Sun Dec 21 03:38:39 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 21 Dec 2008 09:38:39 +0100 Subject: [Numpy-discussion] numpy on windows x64 with mingw: it (almost) works In-Reply-To: <494DFA97.90508@ar.media.kyoto-u.ac.jp> References: <494DFA97.90508@ar.media.kyoto-u.ac.jp> Message-ID: <20081221083839.GA11578@phare.normalesup.org> On Sun, Dec 21, 2008 at 05:13:11PM +0900, David Cournapeau wrote: > Just a few words to mention that I've finally managed to build numpy > with the mingw-w64 project I know it was a tough task. Thanks a lot for doing this. Ga?l From millman at berkeley.edu Sun Dec 21 03:56:22 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Sun, 21 Dec 2008 00:56:22 -0800 Subject: [Numpy-discussion] numpy on windows x64 with mingw: it (almost) works In-Reply-To: <494DFA97.90508@ar.media.kyoto-u.ac.jp> References: <494DFA97.90508@ar.media.kyoto-u.ac.jp> Message-ID: On Sun, Dec 21, 2008 at 12:13 AM, David Cournapeau wrote: > Just a few words to mention that I've finally managed to build numpy > with the mingw-w64 project (port of mingw to AMD 64 bits MS OS), and it > almost run OK. Thanks for working on this. Jarrod From ondrej at certik.cz Sun Dec 21 07:05:57 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Sun, 21 Dec 2008 13:05:57 +0100 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> Message-ID: <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> On Sat, Dec 20, 2008 at 3:02 PM, Pauli Virtanen wrote: > Sat, 20 Dec 2008 20:15:43 +0900, David Cournapeau wrote: >> On Sat, Dec 20, 2008 at 7:43 PM, Ondrej Certik wrote: >>> Just to make it clear -- I think the docs should not be generated in >>> the tarball -- only the sources should be there. >> >> I agree this makes more sense for you, as a packager, but I am not sure >> it makes much sense to put the doc sources in the tarball for users >> (Building numpy should only require python + a C compiler; building the >> doc is more difficult -you need at least sphinx and all its >> dependencies). >> >> For audiolab, I put the generated doc, thinking if people want to mess >> with the doc, they are knowledgeable enough to deal with svn - but I did >> not think about the packagers :) I am not sure what's the best solution: >> maybe put both in the (released) source tarball ? > > I'd say that we put the source for the documentation to the documentation > tarball, and distribute the built HTML+whatever documentation in a > separate package. Why not to just include the *sources* together with numpy, and possibly include html+whatever in a separate documentation package? That way everybody is happy. Ondrej From david at ar.media.kyoto-u.ac.jp Sun Dec 21 07:14:50 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 21 Dec 2008 21:14:50 +0900 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> Message-ID: <494E333A.6090209@ar.media.kyoto-u.ac.jp> Ondrej Certik wrote: > > Why not to just include the *sources* together with numpy, and > possibly include html+whatever in a separate documentation package? > I don't think having separate built doc and built package is a good idea. It is confusing for the user, and I am afraid we won't alway keep everything in sync. We don't need to ship all the doc of course, but at least the pdf (or any other format, the point is to have at least one; it could be different on different platforms for the binaries for example). Specially for new comers, having everything in one place is better IMHO. I agree we should also put the doc sources together with the source tarball: that seems to be the common practice for almost every open source package out there. The only drawback is the tarball size, but since we are still talking about a couple of MB max, I don't think it is very relevant. IOW: - ship the doc sources + one built format with the released source distribution. It should not be built with sdist, but with another mean (so that one can easily generate a source distribution). - ship the built doc with every binary installer. David From pav at iki.fi Sun Dec 21 07:49:37 2008 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 21 Dec 2008 12:49:37 +0000 (UTC) Subject: [Numpy-discussion] missing doc dir in the official tarball References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> Message-ID: Sun, 21 Dec 2008 13:05:57 +0100, Ondrej Certik wrote: > On Sat, Dec 20, 2008 at 3:02 PM, Pauli Virtanen wrote: >> Sat, 20 Dec 2008 20:15:43 +0900, David Cournapeau wrote: >>> On Sat, Dec 20, 2008 at 7:43 PM, Ondrej Certik >>> wrote: >>>> Just to make it clear -- I think the docs should not be generated in >>>> the tarball -- only the sources should be there. >>> >>> I agree this makes more sense for you, as a packager, but I am not >>> sure it makes much sense to put the doc sources in the tarball for >>> users (Building numpy should only require python + a C compiler; >>> building the doc is more difficult -you need at least sphinx and all >>> its dependencies). >>> >>> For audiolab, I put the generated doc, thinking if people want to mess >>> with the doc, they are knowledgeable enough to deal with svn - but I >>> did not think about the packagers :) I am not sure what's the best >>> solution: maybe put both in the (released) source tarball ? >> >> I'd say that we put the source for the documentation to the >> documentation tarball, and distribute the built HTML+whatever >> documentation in a separate package. > > Why not to just include the *sources* together with numpy, and possibly > include html+whatever in a separate documentation package? That's what I tried to say, but mistyped "source" as "documentation". -- Pauli Virtanen From ondrej at certik.cz Sun Dec 21 08:50:32 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Sun, 21 Dec 2008 14:50:32 +0100 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> Message-ID: <85b5c3130812210550l6693d2f2m2bd682ae13cee693@mail.gmail.com> On Sun, Dec 21, 2008 at 1:49 PM, Pauli Virtanen wrote: > Sun, 21 Dec 2008 13:05:57 +0100, Ondrej Certik wrote: > >> On Sat, Dec 20, 2008 at 3:02 PM, Pauli Virtanen wrote: >>> Sat, 20 Dec 2008 20:15:43 +0900, David Cournapeau wrote: >>>> On Sat, Dec 20, 2008 at 7:43 PM, Ondrej Certik >>>> wrote: >>>>> Just to make it clear -- I think the docs should not be generated in >>>>> the tarball -- only the sources should be there. >>>> >>>> I agree this makes more sense for you, as a packager, but I am not >>>> sure it makes much sense to put the doc sources in the tarball for >>>> users (Building numpy should only require python + a C compiler; >>>> building the doc is more difficult -you need at least sphinx and all >>>> its dependencies). >>>> >>>> For audiolab, I put the generated doc, thinking if people want to mess >>>> with the doc, they are knowledgeable enough to deal with svn - but I >>>> did not think about the packagers :) I am not sure what's the best >>>> solution: maybe put both in the (released) source tarball ? >>> >>> I'd say that we put the source for the documentation to the >>> documentation tarball, and distribute the built HTML+whatever >>> documentation in a separate package. >> >> Why not to just include the *sources* together with numpy, and possibly >> include html+whatever in a separate documentation package? > > That's what I tried to say, but mistyped "source" as "documentation". Ok, so we all seem to agree that having (at least) the source of docs together with the main numpy tarball is a good thing. I'll try to have a look at this. Ondrej From david at ar.media.kyoto-u.ac.jp Sun Dec 21 08:48:41 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 21 Dec 2008 22:48:41 +0900 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> Message-ID: <494E4939.8080901@ar.media.kyoto-u.ac.jp> Pauli Virtanen wrote: > Sun, 21 Dec 2008 13:05:57 +0100, Ondrej Certik wrote: > > >> On Sat, Dec 20, 2008 at 3:02 PM, Pauli Virtanen wrote: >> >>> Sat, 20 Dec 2008 20:15:43 +0900, David Cournapeau wrote: >>> >>>> On Sat, Dec 20, 2008 at 7:43 PM, Ondrej Certik >>>> wrote: >>>> >>>>> Just to make it clear -- I think the docs should not be generated in >>>>> the tarball -- only the sources should be there. >>>>> >>>> I agree this makes more sense for you, as a packager, but I am not >>>> sure it makes much sense to put the doc sources in the tarball for >>>> users (Building numpy should only require python + a C compiler; >>>> building the doc is more difficult -you need at least sphinx and all >>>> its dependencies). >>>> >>>> For audiolab, I put the generated doc, thinking if people want to mess >>>> with the doc, they are knowledgeable enough to deal with svn - but I >>>> did not think about the packagers :) I am not sure what's the best >>>> solution: maybe put both in the (released) source tarball ? >>>> >>> I'd say that we put the source for the documentation to the >>> documentation tarball, and distribute the built HTML+whatever >>> documentation in a separate package. >>> >> Why not to just include the *sources* together with numpy, and possibly >> include html+whatever in a separate documentation package? >> > > That's what I tried to say, but mistyped "source" as "documentation". > Pauli, Is everything under trunk/doc necessary to build the doc ? Or only a subset of it ? David From pav at iki.fi Sun Dec 21 09:42:59 2008 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 21 Dec 2008 14:42:59 +0000 (UTC) Subject: [Numpy-discussion] missing doc dir in the official tarball References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> <494E4939.8080901@ar.media.kyoto-u.ac.jp> Message-ID: Sun, 21 Dec 2008 22:48:41 +0900, David Cournapeau wrote: [clip] > Is everything under trunk/doc necessary to build the doc ? Or only a > subset of it ? Only a subset: Makefile, postprocess.py, release/*, source/*, and sphinxext/*. The rest is some older material (eg. numpybook/*), stuff targeted at developers (eg. ufuncs.txt), and miscellaneous stuff for users (eg. cython/, swig/) that should eventually be added as a part of the main documentation. -- Pauli Virtanen From cournape at gmail.com Sun Dec 21 10:30:55 2008 From: cournape at gmail.com (David Cournapeau) Date: Mon, 22 Dec 2008 00:30:55 +0900 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> <494E4939.8080901@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220812210730j5bfe2628r145f1398f6628cab@mail.gmail.com> On Sun, Dec 21, 2008 at 11:42 PM, Pauli Virtanen wrote: > Sun, 21 Dec 2008 22:48:41 +0900, David Cournapeau wrote: > [clip] >> Is everything under trunk/doc necessary to build the doc ? Or only a >> subset of it ? > > Only a subset: Makefile, postprocess.py, release/*, source/*, and > sphinxext/*. Ok, it should now be included in the sdist-generated tarball. I could not generate the doc from it, but I have the same problem when trying directly from the trunk, not sure what the problem is (make html hangs, no cpu consumption). Also, the increase in size is much smaller than I said previously: it looks like most of the stuff which took space in the tarball was the other directories: we go from 1.5 to 1.9 Mb, instead of 3 Mb if we added the whole doc directory. David > > The rest is some older material (eg. numpybook/*), stuff targeted at > developers (eg. ufuncs.txt), and miscellaneous stuff for users (eg. > cython/, swig/) that should eventually be added as a part of the main > documentation. > > -- > Pauli Virtanen > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From jdh2358 at gmail.com Sun Dec 21 10:53:29 2008 From: jdh2358 at gmail.com (John Hunter) Date: Sun, 21 Dec 2008 09:53:29 -0600 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> Message-ID: <88e473830812210753p15bb023fo19f5309365ac1498@mail.gmail.com> On Fri, Dec 19, 2008 at 4:30 PM, Ondrej Certik wrote: > while packaging the new version of numpy, I realized that it is > missing a documentation. I just checked with Stefan on Jabber and he > thinks > it should be rather a trivial fix. Do you Jarrod think you could > please release a new tarball with the doc directory? Since you are packaging numpy docs for debian, and I think numpy is using a variant of the mpl sphinxext plot directive to generate some plots, I will give you a head's up on a recent change Michael just made to the plot directive to fix a problem with the very large size the mpl debian doc build. The inclusion of the high res png and pdf in images generated by the plot directive makes the resultant build very large. Obviously we will be generating a lot more plots than numpy, but you may want to consider upgrading to the the plot directive from the 0.98.5.2 mpl release, which has a "smalldocs" options (build the docs with --small) and only the regular resolution PNG will be generated (no PDF, no hires). This allows us to build the full version for the mpl site and debian to build the small version if they want. JDH From pav at iki.fi Sun Dec 21 13:10:07 2008 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 21 Dec 2008 18:10:07 +0000 (UTC) Subject: [Numpy-discussion] missing doc dir in the official tarball References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <9457e7c80812191443q71b24bb6ne845da54bf215166@mail.gmail.com> <5b8d13220812191835o348ac0c8p936e8d559dbc8e6c@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> <494E4939.8080901@ar.media.kyoto-u.ac.jp> <5b8d13220812210730j5bfe2628r145f1398f6628cab@mail.gmail.com> Message-ID: Mon, 22 Dec 2008 00:30:55 +0900, David Cournapeau wrote: > On Sun, Dec 21, 2008 at 11:42 PM, Pauli Virtanen wrote: >> Sun, 21 Dec 2008 22:48:41 +0900, David Cournapeau wrote: [clip] >>> Is everything under trunk/doc necessary to build the doc ? Or only >>> a >>> subset of it ? >> >> Only a subset: Makefile, postprocess.py, release/*, source/*, and >> sphinxext/*. > > Ok, it should now be included in the sdist-generated tarball. I could > not generate the doc from it, but I have the same problem when trying > directly from the trunk, not sure what the problem is (make html hangs, > no cpu consumption). What platform, what does it output? What does 'make -n' say? -- Pauli Virtanen From josef.pktd at gmail.com Sun Dec 21 16:25:55 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 21 Dec 2008 16:25:55 -0500 Subject: [Numpy-discussion] is there a sortrows Message-ID: <1cd32cbb0812211325q3f31f10du875f21b623f1c5ed@mail.gmail.com> I was looking for a function that sorts a 2-dimensional array by rows. That's what I came up with, is there a more direct way? >>> a array([[1, 2], [0, 0], [1, 0], [0, 2], [2, 1], [1, 0], [1, 0], [0, 0], [1, 0], [2, 2]]) >>> a[np.lexsort(np.fliplr(a).T)] array([[0, 0], [0, 0], [0, 2], [1, 0], [1, 0], [1, 0], [1, 0], [1, 2], [2, 1], [2, 2]]) Note: I needed to flip and transpose, using axis didn't work >>> a.shape (10, 2) >>> np.lexsort(a,axis=1) Traceback (most recent call last): File "", line 1, in np.lexsort(a,axis=1) ValueError: axis(=1) out of bounds Specifying individual columns in argument also works, but it's a pain if I don't know how many columns there are: >>> a[np.lexsort((a[:,1],a[:,0]))] array([[0, 0], [0, 0], [0, 2], [1, 0], [1, 0], [1, 0], [1, 0], [1, 2], [2, 1], [2, 2]]) A helper function sortrows would be helpful, I don't know what would be the higher dimensional equivalent. Or did I miss a function that I didn't find in the help file? Thanks, Josef From cournape at gmail.com Sun Dec 21 20:36:15 2008 From: cournape at gmail.com (David Cournapeau) Date: Mon, 22 Dec 2008 10:36:15 +0900 Subject: [Numpy-discussion] missing doc dir in the official tarball In-Reply-To: References: <85b5c3130812191430q63fa790pa8abeefc65e5261a@mail.gmail.com> <85b5c3130812200243j788e48f4sfe762736fd26117c@mail.gmail.com> <5b8d13220812200315r457f9e6dy518add51524ed2e8@mail.gmail.com> <85b5c3130812210405n436b6d78g925159b9deff4ab2@mail.gmail.com> <494E4939.8080901@ar.media.kyoto-u.ac.jp> <5b8d13220812210730j5bfe2628r145f1398f6628cab@mail.gmail.com> Message-ID: <5b8d13220812211736p5b16b758r630484a647db811a@mail.gmail.com> On Mon, Dec 22, 2008 at 3:10 AM, Pauli Virtanen wrote: > Mon, 22 Dec 2008 00:30:55 +0900, David Cournapeau wrote: > >> On Sun, Dec 21, 2008 at 11:42 PM, Pauli Virtanen wrote: >>> Sun, 21 Dec 2008 22:48:41 +0900, David Cournapeau wrote: [clip] >>>> Is everything under trunk/doc necessary to build the doc ? Or only >>>> a >>>> subset of it ? >>> >>> Only a subset: Makefile, postprocess.py, release/*, source/*, and >>> sphinxext/*. >> >> Ok, it should now be included in the sdist-generated tarball. I could >> not generate the doc from it, but I have the same problem when trying >> directly from the trunk, not sure what the problem is (make html hangs, >> no cpu consumption). > > What platform, what does it output? What does 'make -n' say? I only made this comment to imply that I did not test whether I included everything needed. I don't think my problem is really relevant, it may just be a configuration problem. David From david at ar.media.kyoto-u.ac.jp Sun Dec 21 21:53:18 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 22 Dec 2008 11:53:18 +0900 Subject: [Numpy-discussion] is there a sortrows In-Reply-To: <1cd32cbb0812211325q3f31f10du875f21b623f1c5ed@mail.gmail.com> References: <1cd32cbb0812211325q3f31f10du875f21b623f1c5ed@mail.gmail.com> Message-ID: <494F011E.6040105@ar.media.kyoto-u.ac.jp> josef.pktd at gmail.com wrote: > I was looking for a function that sorts a 2-dimensional array by rows. > That's what I came up with, is there a more direct way? > >>>> a > array([[1, 2], > [0, 0], > [1, 0], > [0, 2], > [2, 1], > [1, 0], > [1, 0], > [0, 0], > [1, 0], > [2, 2]]) >>>> a[np.lexsort(np.fliplr(a).T)] > array([[0, 0], > [0, 0], > [0, 2], > [1, 0], > [1, 0], > [1, 0], > [1, 0], > [1, 2], > [2, 1], > [2, 2]]) > > Note: I needed to flip and transpose, using axis didn't work > >>>> a.shape > (10, 2) >>>> np.lexsort(a,axis=1) > Traceback (most recent call last): > File "", line 1, in > np.lexsort(a,axis=1) > ValueError: axis(=1) out of bounds > > > Specifying individual columns in argument also works, but it's a pain > if I don't know how many columns there are: > >>>> a[np.lexsort((a[:,1],a[:,0]))] > array([[0, 0], > [0, 0], > [0, 2], > [1, 0], > [1, 0], > [1, 0], > [1, 0], > [1, 2], > [2, 1], > [2, 2]]) > > A helper function sortrows would be helpful, I don't know what would > be the higher dimensional equivalent. > Or did I miss a function that I didn't find in the help file? I may miss something obvious, but why are you using lexsort at all ? At leat, the first example is easily achieved with sort(x, axis=0) - but maybe you have more complicated examples in mind where you need actual lexical sort: David From robert.kern at gmail.com Sun Dec 21 22:18:48 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 21 Dec 2008 21:18:48 -0600 Subject: [Numpy-discussion] is there a sortrows In-Reply-To: <494F011E.6040105@ar.media.kyoto-u.ac.jp> References: <1cd32cbb0812211325q3f31f10du875f21b623f1c5ed@mail.gmail.com> <494F011E.6040105@ar.media.kyoto-u.ac.jp> Message-ID: <3d375d730812211918t483745f1t6abaa3d7711ed5d2@mail.gmail.com> On Sun, Dec 21, 2008 at 20:53, David Cournapeau wrote: > josef.pktd at gmail.com wrote: >> I was looking for a function that sorts a 2-dimensional array by rows. >> That's what I came up with, is there a more direct way? >> >>>>> a >> array([[1, 2], >> [0, 0], >> [1, 0], >> [0, 2], >> [2, 1], >> [1, 0], >> [1, 0], >> [0, 0], >> [1, 0], >> [2, 2]]) >>>>> a[np.lexsort(np.fliplr(a).T)] >> array([[0, 0], >> [0, 0], >> [0, 2], >> [1, 0], >> [1, 0], >> [1, 0], >> [1, 0], >> [1, 2], >> [2, 1], >> [2, 2]]) >> >> Note: I needed to flip and transpose, using axis didn't work >> >>>>> a.shape >> (10, 2) >>>>> np.lexsort(a,axis=1) >> Traceback (most recent call last): >> File "", line 1, in >> np.lexsort(a,axis=1) >> ValueError: axis(=1) out of bounds >> >> >> Specifying individual columns in argument also works, but it's a pain >> if I don't know how many columns there are: >> >>>>> a[np.lexsort((a[:,1],a[:,0]))] >> array([[0, 0], >> [0, 0], >> [0, 2], >> [1, 0], >> [1, 0], >> [1, 0], >> [1, 0], >> [1, 2], >> [2, 1], >> [2, 2]]) >> >> A helper function sortrows would be helpful, I don't know what would >> be the higher dimensional equivalent. >> Or did I miss a function that I didn't find in the help file? > > I may miss something obvious, but why are you using lexsort at all ? At > leat, the first example is easily achieved with sort(x, axis=0) No, it isn't. In [4]: sort(a, axis=0) Out[4]: array([[0, 0], [0, 0], [0, 0], [1, 0], [1, 0], [1, 0], [1, 1], [1, 2], [2, 2], [2, 2]]) Compare to his desired result: array([[0, 0], [0, 0], [0, 2], [1, 0], [1, 0], [1, 0], [1, 0], [1, 2], [2, 1], [2, 2]]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Sun Dec 21 22:19:37 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 21 Dec 2008 22:19:37 -0500 Subject: [Numpy-discussion] is there a sortrows In-Reply-To: <494F011E.6040105@ar.media.kyoto-u.ac.jp> References: <1cd32cbb0812211325q3f31f10du875f21b623f1c5ed@mail.gmail.com> <494F011E.6040105@ar.media.kyoto-u.ac.jp> Message-ID: <1cd32cbb0812211919j46f275b7j58470a9f62be48dd@mail.gmail.com> > > I may miss something obvious, but why are you using lexsort at all ? At > leat, the first example is easily achieved with sort(x, axis=0) - but > maybe you have more complicated examples in mind where you need actual > lexical sort: > > David >From the examples that I tried out np.sort, sorts each column separately (with axis = 0). If the elements of a row is supposed to stay together, then np.sort doesn't work >>> arr array([[ 1, 14], [ 4, 12], [ 3, 11], [ 2, 14]]) >>> np.sort(arr,axis=0) array([[ 1, 11], [ 2, 12], [ 3, 14], [ 4, 14]]) Josef From david at ar.media.kyoto-u.ac.jp Sun Dec 21 22:19:44 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 22 Dec 2008 12:19:44 +0900 Subject: [Numpy-discussion] is there a sortrows In-Reply-To: <1cd32cbb0812211919j46f275b7j58470a9f62be48dd@mail.gmail.com> References: <1cd32cbb0812211325q3f31f10du875f21b623f1c5ed@mail.gmail.com> <494F011E.6040105@ar.media.kyoto-u.ac.jp> <1cd32cbb0812211919j46f275b7j58470a9f62be48dd@mail.gmail.com> Message-ID: <494F0750.3090107@ar.media.kyoto-u.ac.jp> josef.pktd at gmail.com wrote: >> I may miss something obvious, but why are you using lexsort at all ? At >> leat, the first example is easily achieved with sort(x, axis=0) - but >> maybe you have more complicated examples in mind where you need actual >> lexical sort: >> >> David >> > > >From the examples that I tried out np.sort, sorts each column > separately (with axis = 0). If the elements of a row is supposed to > stay together, then np.sort doesn't work. > You're right, as Robert just mentioned, I totally missed the point of your example... David From pgmdevlist at gmail.com Sun Dec 21 23:10:26 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 21 Dec 2008 23:10:26 -0500 Subject: [Numpy-discussion] is there a sortrows In-Reply-To: <1cd32cbb0812211919j46f275b7j58470a9f62be48dd@mail.gmail.com> References: <1cd32cbb0812211325q3f31f10du875f21b623f1c5ed@mail.gmail.com> <494F011E.6040105@ar.media.kyoto-u.ac.jp> <1cd32cbb0812211919j46f275b7j58470a9f62be48dd@mail.gmail.com> Message-ID: <43208CCB-427E-48C7-8939-E9FAE84A8EE3@gmail.com> On Dec 21, 2008, at 10:19 PM, josef.pktd at gmail.com wrote: > >> From the examples that I tried out np.sort, sorts each column > separately (with axis = 0). If the elements of a row is supposed to > stay together, then np.sort doesn't work Well, if the elements are supposed to stay together, why wouldn't you tie them first, sort, and then untie them ? >>> np.sort(a.view([('',int),('',int)]),0).view(int) The first view transforms your 2D array into a 1D array of tuples, the second one retransforms the 1D array to 2D. Not sure it's better than your lexsort, haven't timed it. From josef.pktd at gmail.com Sun Dec 21 23:37:20 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 21 Dec 2008 23:37:20 -0500 Subject: [Numpy-discussion] is there a sortrows In-Reply-To: <43208CCB-427E-48C7-8939-E9FAE84A8EE3@gmail.com> References: <1cd32cbb0812211325q3f31f10du875f21b623f1c5ed@mail.gmail.com> <494F011E.6040105@ar.media.kyoto-u.ac.jp> <1cd32cbb0812211919j46f275b7j58470a9f62be48dd@mail.gmail.com> <43208CCB-427E-48C7-8939-E9FAE84A8EE3@gmail.com> Message-ID: <1cd32cbb0812212037t99fc765gff1e84864c018cdb@mail.gmail.com> On Sun, Dec 21, 2008 at 11:10 PM, Pierre GM wrote: > > On Dec 21, 2008, at 10:19 PM, josef.pktd at gmail.com wrote: >> >>> From the examples that I tried out np.sort, sorts each column >> separately (with axis = 0). If the elements of a row is supposed to >> stay together, then np.sort doesn't work > > Well, if the elements are supposed to stay together, why wouldn't you > tie them first, sort, and then untie them ? > > >>> np.sort(a.view([('',int),('',int)]),0).view(int) > > The first view transforms your 2D array into a 1D array of tuples, the > second one retransforms the 1D array to 2D. > > Not sure it's better than your lexsort, haven't timed it. That's very helpful, not so much about the sort but it's a good example to move back and forth between structured and regular arrays. My help search for this was not successful enough to figure this out by myself. Several functions require structured arrays but I didn't know how to get them without specifying everything by hand. And when I have a structured array, I didn't know how to call var or mean on them. Your suggestion also works with automatic adjustment for number of columns. >>> np.sort(a.view([('',' References: <1cd32cbb0812211325q3f31f10du875f21b623f1c5ed@mail.gmail.com> <494F011E.6040105@ar.media.kyoto-u.ac.jp> <1cd32cbb0812211919j46f275b7j58470a9f62be48dd@mail.gmail.com> <43208CCB-427E-48C7-8939-E9FAE84A8EE3@gmail.com> <1cd32cbb0812212037t99fc765gff1e84864c018cdb@mail.gmail.com> Message-ID: <1cd32cbb0812212052y5f018e6fle1fe336d47d45c69@mail.gmail.com> On Sun, Dec 21, 2008 at 11:37 PM, wrote: > On Sun, Dec 21, 2008 at 11:10 PM, Pierre GM wrote: >> >> On Dec 21, 2008, at 10:19 PM, josef.pktd at gmail.com wrote: >>> >>>> From the examples that I tried out np.sort, sorts each column >>> separately (with axis = 0). If the elements of a row is supposed to >>> stay together, then np.sort doesn't work >> >> Well, if the elements are supposed to stay together, why wouldn't you >> tie them first, sort, and then untie them ? >> >> >>> np.sort(a.view([('',int),('',int)]),0).view(int) >> >> The first view transforms your 2D array into a 1D array of tuples, the >> second one retransforms the 1D array to 2D. >> >> Not sure it's better than your lexsort, haven't timed it. > > That's very helpful, not so much about the sort but it's a good > example to move back and forth between structured and regular arrays. > My help search for this was not successful enough to figure this out > by myself. Several functions require structured arrays but I didn't > know how to get them without specifying everything by hand. And when I > have a structured array, I didn't know how to call var or mean on > them. > > Your suggestion also works with automatic adjustment for number of columns. > >>>> np.sort(a.view([('',' > Thanks, > > Josef > Version with fully automatic conversion, I don't even have to know the dtype >>> np.sort(a.view([('',a.dtype)]*a.shape[1]),0).view(a.dtype) (this is for future Google searches) Josef From ccasey at enthought.com Mon Dec 22 14:11:48 2008 From: ccasey at enthought.com (Chris Casey) Date: Mon, 22 Dec 2008 13:11:48 -0600 Subject: [Numpy-discussion] EPD Py2.5 v4.1.30101 Released Message-ID: <1229973108.5867.0.camel@linux-8ej9.site> Greetings, Enthought, Inc. is very pleased to announce the newest release of the Enthought Python Distribution (EPD) Py2.5 v4.1.30101: http://www.enthought.com/epd The size of the installer has be reduced by about half. Also, this is the first release to include a 3.1.0 version of the Enthought Tool Suite (http://code.enthought.com/), featuring Mayavi 3.1.0. This is also the first release to use Enthought's enhanced version of setuptools, Enstaller (http://code.enthought.com/projects/enstaller/). Windows installation enhancements, matplotlib and wx issues, and menu consistency accross platforms are among notable fixes. The full release notes for this release can be found here: https://svn.enthought.com/epd/wiki/Py25/4.1.30101/RelNotes Many thanks to the EPD team for putting this release together, and to the community of folks who have provided all of the valuable tools bundled here. Best Regards, Chris --------- About EPD --------- The Enthought Python Distribution (EPD) is a "kitchen-sink-included" distribution of the Python? Programming Language, including over 80 additional tools and libraries. The EPD bundle includes NumPy, SciPy, IPython, 2D and 3D visualization, database adapters, and a lot of other tools right out of the box. http://www.enthought.com/products/epd.php It is currently available as an easy, single-click installer for Windows XP (x86), Mac OS X (a universal binary for Intel 10.4 and above) and RedHat EL3 (x86 and amd64). EPD is free for 30-day trial use and for use in degree-granting academic institutions. An annual Subscription and installation support are available for commercial use (http://www.enthought.com/products/epddownload.php ) including an Enterprise Subscription with support for particular deployment environments (http://www.enthought.com/products/enterprise.php ). _______________________________________________ Enthought-dev mailing list Enthought-dev at mail.enthought.com https://mail.enthought.com/mailman/listinfo/enthought-dev From gael.varoquaux at normalesup.org Mon Dec 22 19:39:54 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 23 Dec 2008 01:39:54 +0100 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code In-Reply-To: <1229973108.5867.0.camel@linux-8ej9.site> References: <1229973108.5867.0.camel@linux-8ej9.site> Message-ID: <20081223003954.GF13171@phare.normalesup.org> Hi, This mailing list is full of people spending their time writing non-trivial numerical code. This is why I would like to share my interrogations on a code smell that I notice a lot in my numerical code that revolves around persisting to disk often, and the mess that results. It is a bit hard to describe and it has been on my mind for a couple of months. I have finally written a blog post in an attempt to share my thoughts: http://gael-varoquaux.info/blog/?p=83 Pointing to a blog post on a mailing list seems to me almost rude, and I hope you'll forgive, but I'd love any feedback. It seems to me I am missing a pattern, or simply some insight on a recurrent problem. Cheers, Ga?l From olivier.grisel at ensta.org Mon Dec 22 20:10:50 2008 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 23 Dec 2008 02:10:50 +0100 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code In-Reply-To: <20081223003954.GF13171@phare.normalesup.org> References: <1229973108.5867.0.camel@linux-8ej9.site> <20081223003954.GF13171@phare.normalesup.org> Message-ID: Interesting topic indeed. I think I have been hit with similar problems on toy experimental scripts. So far the solution was always adhoc FS caches of numpy arrays with manual filename management. Maybe the first step for designing a generic solution would be to list some representative yet simple enough use cases with real sample python code so as to focus on concrete matters and avoid over engineering a general solution for philosophical problems. -- Olivier On Dec 23, 2008 1:40 AM, "Gael Varoquaux" wrote: Hi, This mailing list is full of people spending their time writing non-trivial numerical code. This is why I would like to share my interrogations on a code smell that I notice a lot in my numerical code that revolves around persisting to disk often, and the mess that results. It is a bit hard to describe and it has been on my mind for a couple of months. I have finally written a blog post in an attempt to share my thoughts: http://gael-varoquaux.info/blog/?p=83 Pointing to a blog post on a mailing list seems to me almost rude, and I hope you'll forgive, but I'd love any feedback. It seems to me I am missing a pattern, or simply some insight on a recurrent problem. Cheers, Ga?l _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Mon Dec 22 22:35:28 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 23 Dec 2008 12:35:28 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch Message-ID: <49505C80.80709@ar.media.kyoto-u.ac.jp> Hi, I updated a small branch of mine which is meant to fix a problem on Mac OS X with python 2.6 (see http://projects.scipy.org/pipermail/numpy-discussion/2008-November/038816.html for the problem) and would like one core numpy developer to review it before I merge it. The problem can be seen with the following test code: #include int main() { #ifdef WORDS_BIGENDIAN printf("Big endian macro defined\n"); #else printf("No big endian macro defined\n"); #endif return 0; } If I build the above with python 2.5 on mac os X (intel), then I get the message no big endian. But with my version 2.6 (installed from official binary), I get Big endian, which is obviously wrong for my machine. This is a problem in python, but we can fix it in numpy (which depends on this macro). The fix is simple: set our own NPY_BIG_ENDIAN/NPY_LITTLE_ENDIAN instead of relying on the python header one. More precisely: - a header cpuarch.h has been added: it uses toolchain specific macro to set one of the NPY_TARGET_CPU_* macro. X86, AMD64, PPC, SPARC, S390, and PA_RISC are detected. (I obviously did not tested them all). - NPY_LITTLE_ENDIAN is set for little endian, NPY_BIG_ENDIAN is set for big endian, according to the detected CPU (Or directly using endian.h if available). - NPY_BYTE_ORDER is set to 4321 for big endian, 1234 for little endian (following glibc endian.h convention) - endianess is set in the numpy headers at the time they are read (whenever you include it) - remove any mention of WORDS_BIGENDIAN in the source code (only _signbit.c used it). I don't like so much depending on CPU detection, but OTOH, the only other solution I can see would be to have numpy headers which do not rely on endianness at all, which does not seem possible without breaking some API (the macro which test for endianness: PyArray_ISNBO and all the other ones which depend on it, including PyArray_ISNOTSWAPPED). cheers, David From charlesr.harris at gmail.com Tue Dec 23 00:15:18 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 22 Dec 2008 22:15:18 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <49505C80.80709@ar.media.kyoto-u.ac.jp> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> Message-ID: On Mon, Dec 22, 2008 at 8:35 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Hi, > > I updated a small branch of mine which is meant to fix a problem on > Mac OS X with python 2.6 (see > > http://projects.scipy.org/pipermail/numpy-discussion/2008-November/038816.html > for the problem) and would like one core numpy developer to review it > before I merge it. > > The problem can be seen with the following test code: > > #include > > int main() > { > #ifdef WORDS_BIGENDIAN > printf("Big endian macro defined\n"); > #else > printf("No big endian macro defined\n"); > #endif > > return 0; > } > > If I build the above with python 2.5 on mac os X (intel), then I get the > message no big endian. But with my version 2.6 (installed from official > binary), I get Big endian, which is obviously wrong for my machine. This > is a problem in python, but we can fix it in numpy (which depends on > this macro). > > The fix is simple: set our own NPY_BIG_ENDIAN/NPY_LITTLE_ENDIAN instead > of relying on the python header one. More precisely: > - a header cpuarch.h has been added: it uses toolchain specific Is there a good reason to use a separate file? I assume this header will just end up being included in one of the others. Maybe you could put it in the same header that sets up all the differently sized types. > > macro to set one of the NPY_TARGET_CPU_* macro. X86, AMD64, PPC, SPARC, > S390, and PA_RISC are detected. (I obviously did not tested them all). > - NPY_LITTLE_ENDIAN is set for little endian, NPY_BIG_ENDIAN is > set for big endian, according to the detected CPU (Or directly using > endian.h if available). > - NPY_BYTE_ORDER is set to 4321 for big endian, 1234 for little > endian (following glibc endian.h convention) > - endianess is set in the numpy headers at the time they are > read (whenever you include it) > - remove any mention of WORDS_BIGENDIAN in the source code (only > _signbit.c used it). > Let's get rid of _signbit.c and move the signbit function into umath_funcs_c99. It can also be simplified using NPY_INT32 for the integer type. I'd go for a pointer cast and dereference myself but the current implementation is pretty common and I don't think it matters much. I think it is OK to set the order by the CPU type. The PPC might be a bit iffy, but I don't know of any products using its bigendian mode -- not that there aren't any. Is there any simple way that someone who needs a special case can override the automatic settings? > I don't like so much depending on CPU detection, but OTOH, the only > other solution I can see would be to have numpy headers which do not > rely on endianness at all, which does not seem possible without breaking > some API (the macro which test for endianness: PyArray_ISNBO and all the > other ones which depend on it, including PyArray_ISNOTSWAPPED). > Do what you gotta do. It sounds like the CPU can be determined from a macro set by the compiler, is that so? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Dec 23 00:26:34 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 22 Dec 2008 22:26:34 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> Message-ID: Hi David, On Mon, Dec 22, 2008 at 10:15 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: It's pretty easy to determine byte order at run time. Maybe another configuration test is in order... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Dec 23 00:14:00 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 23 Dec 2008 14:14:00 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> Message-ID: <49507398.7050205@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On Mon, Dec 22, 2008 at 8:35 PM, David Cournapeau > > > wrote: > > Hi, > > I updated a small branch of mine which is meant to fix a problem on > Mac OS X with python 2.6 (see > http://projects.scipy.org/pipermail/numpy-discussion/2008-November/038816.html > for the problem) and would like one core numpy developer to review it > before I merge it. > > The problem can be seen with the following test code: > > #include > > int main() > { > #ifdef WORDS_BIGENDIAN > printf("Big endian macro defined\n"); > #else > printf("No big endian macro defined\n"); > #endif > > return 0; > } > > If I build the above with python 2.5 on mac os X (intel), then I > get the > message no big endian. But with my version 2.6 (installed from > official > binary), I get Big endian, which is obviously wrong for my > machine. This > is a problem in python, but we can fix it in numpy (which depends on > this macro). > > The fix is simple: set our own NPY_BIG_ENDIAN/NPY_LITTLE_ENDIAN > instead > of relying on the python header one. More precisely: > - a header cpuarch.h has been added: it uses toolchain specific > > > Is there a good reason to use a separate file? I assume this header > will just end up being included in one of the others. Maybe you could > put it in the same header that sets up all the differently sized types. None other than I don't like big source files or big headers. > > Let's get rid of _signbit.c and move the signbit function into > umath_funcs_c99. It can also be simplified using NPY_INT32 for the > integer type. I'd go for a pointer cast and dereference myself but the > current implementation is pretty common and I don't think it matters much. If _signbit.c can be changed, so be it. But it does not impact the patch as it is. There are only two places which use WORD_BIGENDIAN directly: _signbit.c and mconf.h (in scipy). > > I think it is OK to set the order by the CPU type. The PPC might be a > bit iffy, but I don't know of any products using its bigendian mode I guess you meant little endian. Yes, setting endianness by processor is iffy, but there is no way around it: either we detect endianness at runtime, and deal with it correctly, or if we depend on it in the headers as we currently do, we need some way to detect it from some macros set by the system. I prefer the CPU solution to the platform specific one (most systems have some way to detect this kind of things I guess by including some headers), specially since I think detecting the CPU can be useful for other things later (SSE optimization for example). > Is there any simple way that someone who needs a special case can > override the automatic settings? No. We could add a way to override the CPU-based detection to force it. But since distutils is so limited in that regard, I think people will need to go into the sources anyway, so I am not sure it is really worthwhile. What bothers me more is that this kind of misconfiguration (little-endian ppc) goes undetected. We could add a runtime check, at least ? > > > I don't like so much depending on CPU detection, but OTOH, the only > other solution I can see would be to have numpy headers which do not > rely on endianness at all, which does not seem possible without > breaking > some API (the macro which test for endianness: PyArray_ISNBO and > all the > other ones which depend on it, including PyArray_ISNOTSWAPPED). > > > Do what you gotta do. It sounds like the CPU can be determined from a > macro set by the compiler, is that so? Yep. I gather the values from glibc and boost, which should cover quite a number of platforms I think. David From david at ar.media.kyoto-u.ac.jp Tue Dec 23 00:15:37 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 23 Dec 2008 14:15:37 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> Message-ID: <495073F9.3070401@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > Hi David, > > On Mon, Dec 22, 2008 at 10:15 PM, Charles R Harris > > wrote: > > > It's pretty easy to determine byte order at run time. Maybe another > configuration test is in order... Yes, but that's not enough, since some macro defined in the header uses the endianness, and thus are hardcoded. If we could remove those macro, then it is definitely a better solution. cheers, David From robert.kern at gmail.com Tue Dec 23 00:35:58 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 22 Dec 2008 23:35:58 -0600 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <495073F9.3070401@ar.media.kyoto-u.ac.jp> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> Message-ID: <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> On Mon, Dec 22, 2008 at 23:15, David Cournapeau wrote: > Charles R Harris wrote: >> Hi David, >> >> On Mon, Dec 22, 2008 at 10:15 PM, Charles R Harris >> > wrote: >> >> >> It's pretty easy to determine byte order at run time. Maybe another >> configuration test is in order... > > Yes, but that's not enough, since some macro defined in the header uses > the endianness, and thus are hardcoded. If we could remove those macro, > then it is definitely a better solution. I think he meant that it can be discovered at runtime in general, not at numpy-run-time, so we can write a small C program that can be run at numpy-build-time to add another entry to config.h. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Tue Dec 23 00:40:49 2008 From: cournape at gmail.com (David Cournapeau) Date: Tue, 23 Dec 2008 14:40:49 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> Message-ID: <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern wrote: > > I think he meant that it can be discovered at runtime in general, not > at numpy-run-time, so we can write a small C program that can be run > at numpy-build-time to add another entry to config.h. But then we only move the problem: people who want to build universal numpy extensions will have the wrong value, no ? The fundamental point of my patch is that the value is set whenever ndarrayobject.h is included. So even if I build numpy on PPC, NPY_BIGENDIAN will not be defined when the header is included for a file build with gcc -arch i386. David From robert.kern at gmail.com Tue Dec 23 00:47:14 2008 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 22 Dec 2008 23:47:14 -0600 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> Message-ID: <3d375d730812222147w3375bceeqdc385e61eda3b20c@mail.gmail.com> On Mon, Dec 22, 2008 at 23:40, David Cournapeau wrote: > On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern wrote: > >> >> I think he meant that it can be discovered at runtime in general, not >> at numpy-run-time, so we can write a small C program that can be run >> at numpy-build-time to add another entry to config.h. > > But then we only move the problem: people who want to build universal > numpy extensions will have the wrong value, no ? Fair point. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Tue Dec 23 00:55:51 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 22 Dec 2008 22:55:51 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> Message-ID: On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau wrote: > On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern > wrote: > > > > > I think he meant that it can be discovered at runtime in general, not > > at numpy-run-time, so we can write a small C program that can be run > > at numpy-build-time to add another entry to config.h. > > But then we only move the problem: people who want to build universal > numpy extensions will have the wrong value, no ? The fundamental point > of my patch is that the value is set whenever ndarrayobject.h is > included. So even if I build numpy on PPC, NPY_BIGENDIAN will not be > defined when the header is included for a file build with gcc -arch > i386. > We can probably set things up so the determination is at run time -- but we need to be sure that the ABI isn't affected. I did that once for an old project that needed data portability. In any case, it sounds like a project for a later release. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Dec 23 01:20:16 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 23 Dec 2008 15:20:16 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> Message-ID: <49508320.60804@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau > wrote: > > On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern > > wrote: > > > > > I think he meant that it can be discovered at runtime in > general, not > > at numpy-run-time, so we can write a small C program that can be run > > at numpy-build-time to add another entry to config.h. > > But then we only move the problem: people who want to build universal > numpy extensions will have the wrong value, no ? The fundamental point > of my patch is that the value is set whenever ndarrayobject.h is > included. So even if I build numpy on PPC, NPY_BIGENDIAN will not be > defined when the header is included for a file build with gcc -arch > i386. > > > We can probably set things up so the determination is at run time -- > but we need to be sure that the ABI isn't affected. I did that once > for an old project that needed data portability. In any case, it > sounds like a project for a later release. It cannot work for numpy without breaking backward compatibility, because of the following lines: #define PyArray_ISNBO(arg) ((arg) != NPY_OPPBYTE) #define PyArray_IsNativeByteOrder PyArray_ISNBO #define PyArray_ISNOTSWAPPED(m) PyArray_ISNBO(PyArray_DESCR(m)->byteorder) #define PyArray_ISBYTESWAPPED(m) (!PyArray_ISNOTSWAPPED(m)) Since any code using the macro will expand it at build time, you can't make it such as it will be correct at runtime. We would have to replace those macro by "non inlinable" functions. I will add a function to detect endianness, though, just to check that the macro value corresponds to the runtime one (it will make problems on say little endian ppc much easier to detect). David From david at ar.media.kyoto-u.ac.jp Tue Dec 23 01:23:37 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 23 Dec 2008 15:23:37 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <49508320.60804@ar.media.kyoto-u.ac.jp> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> <49508320.60804@ar.media.kyoto-u.ac.jp> Message-ID: <495083E9.5020105@ar.media.kyoto-u.ac.jp> David Cournapeau wrote: > Charles R Harris wrote: > >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau > > wrote: >> >> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern >> > wrote: >> >> > >> > I think he meant that it can be discovered at runtime in >> general, not >> > at numpy-run-time, so we can write a small C program that can be run >> > at numpy-build-time to add another entry to config.h. >> >> But then we only move the problem: people who want to build universal >> numpy extensions will have the wrong value, no ? The fundamental point >> of my patch is that the value is set whenever ndarrayobject.h is >> included. So even if I build numpy on PPC, NPY_BIGENDIAN will not be >> defined when the header is included for a file build with gcc -arch >> i386. >> >> >> We can probably set things up so the determination is at run time -- >> but we need to be sure that the ABI isn't affected. I did that once >> for an old project that needed data portability. In any case, it >> sounds like a project for a later release. >> > > It cannot work for numpy without breaking backward compatibility, > because of the following lines: > Actually, you could, by making the macro point to actual functions, but that would add function call cost. I don't know if the function call cost is significant or not in the cases where those macro are used, David From charlesr.harris at gmail.com Tue Dec 23 04:07:28 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 23 Dec 2008 02:07:28 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <495083E9.5020105@ar.media.kyoto-u.ac.jp> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> Message-ID: On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > David Cournapeau wrote: > > Charles R Harris wrote: > > > >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau >> > wrote: > >> > >> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern > >> > wrote: > >> > >> > > >> > I think he meant that it can be discovered at runtime in > >> general, not > >> > at numpy-run-time, so we can write a small C program that can be > run > >> > at numpy-build-time to add another entry to config.h. > >> > >> But then we only move the problem: people who want to build > universal > >> numpy extensions will have the wrong value, no ? The fundamental > point > >> of my patch is that the value is set whenever ndarrayobject.h is > >> included. So even if I build numpy on PPC, NPY_BIGENDIAN will not be > >> defined when the header is included for a file build with gcc -arch > >> i386. > >> > >> > >> We can probably set things up so the determination is at run time -- > >> but we need to be sure that the ABI isn't affected. I did that once > >> for an old project that needed data portability. In any case, it > >> sounds like a project for a later release. > >> > > > > It cannot work for numpy without breaking backward compatibility, > > because of the following lines: > > > > Actually, you could, by making the macro point to actual functions, but > that would add function call cost. I don't know if the function call > cost is significant or not in the cases where those macro are used, > Exactly. Function calls are pretty cheap on modern hardware with good compilers, nor would I expect the calls to be the bottleneck in most applications. The functions would need to be visible to third party applications, however... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From reakinator at gmail.com Tue Dec 23 11:52:05 2008 From: reakinator at gmail.com (Rich E) Date: Tue, 23 Dec 2008 17:52:05 +0100 Subject: [Numpy-discussion] help with typemapping a C function to use numpy arrays Message-ID: Hi list, My question has to do with the Numpy/SWIG typemapping system. I recently got the typemaps in numpy.i to work on most of my C functions that are wrapped using SWIG, if they have arguments of the form (int sizeArray, float *pArray). Now I am trying to figure out how to wrap function that aren't of the form, such as the following function: /*! \brief compute magnitude spectrum of a DFT * * \param sizeMag size of output Magnitude (half of input real FFT) * \param pFReal pointer to input FFT real array (real/imag floats) * \param pFMAg pointer to float array of magnitude spectrum */ void sms_spectrumMag( int sizeMag, float *pInRect, float *pOutMag) { int i, it2; float fReal, fImag; for (i=0; i Hello Numpy community, I want to know if?Numpy could deal with symbolic arrays and lists (by symbolic I mean without specifying the concrete contents of list or array) For example I want to solve a system of equations containing lists and arrays like this solve(x+Sum[A[k],k=i..N]==y+Sum[B[k],k=m..N], j-Length[C]==l-Length[D], ?z/(c?^ i)==t/(c?^ h), u+1==2*v-3w, v=f(f(w))) (here A and B are arrays; C?et D are lists; x,y,z,t,j,l,i,h,u,v,w are variables that could be of type integer or real, c is a constant and f is a function): ? Thank you very much. Yours faithfully, Olfa MRAIHI -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jim.Vickroy at noaa.gov Wed Dec 24 07:19:33 2008 From: Jim.Vickroy at noaa.gov (Jim Vickroy) Date: Wed, 24 Dec 2008 05:19:33 -0700 Subject: [Numpy-discussion] Is it possible with numpy? In-Reply-To: <945910.69974.qm@web26108.mail.ukl.yahoo.com> References: <945910.69974.qm@web26108.mail.ukl.yahoo.com> Message-ID: <495228D5.7010605@noaa.gov> olfa mraihi wrote: > Hello Numpy community, > I want to know if Numpy could deal with symbolic arrays and lists (by > symbolic I mean without specifying the concrete contents of list or > array) > For example I want to solve a system of equations containing lists and > arrays like this > solve(x+Sum[A[k],k=i..N]==y+Sum[B[k],k=m..N], > j-Length[C]==l-Length[D], > z/(c ^ i)==t/(c ^ h), > u+1==2*v-3w, > v=f(f(w))) > (here A and B are arrays; C et D are lists; > x,y,z,t,j,l,i,h,u,v,w are variables that could be of type integer or > real, c is a constant and f is a function): > > Thank you very much. > Yours faithfully, > Olfa MRAIHI > > If I understand you correctly, I believe the answer is no. Have you considered PyDSTool and SymPy ? > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Dec 24 08:21:16 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 24 Dec 2008 14:21:16 +0100 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code In-Reply-To: References: <1229973108.5867.0.camel@linux-8ej9.site> <20081223003954.GF13171@phare.normalesup.org> Message-ID: <20081224132116.GB24856@phare.normalesup.org> On Tue, Dec 23, 2008 at 02:10:50AM +0100, Olivier Grisel wrote: > Interesting topic indeed. I think I have been hit with similar problems on > toy experimental scripts. So far the solution was always adhoc FS caches > of numpy arrays with manual filename management. Maybe the first step for > designing a generic solution would be to list some representative yet > simple enough use cases with real sample python code so as to focus on > concrete matters and avoid over engineering a general solution for > philosophical problems. Yes, that's clearly a first ste: list the usecases, and the way we would like it solved: think about the API. My internet connection is quite random currently, and I'll probably loose it for a week any time soon. Do you want to start such a page on the wiki. Mark it as a sratch page, and we'll delete it later. I should point out that joblib (on PyPI and launchpad) was a first attempt to solve this problem, so you could have a look at it. I have already identified things that are wrong with joblib (more on the API side than actual bugs), so I know it is not a final solution. Figuring out what was wrong only came from using it heavily in my work. I thing the only way forward it to start something, use it, figure out what's wrong, and start again... Looking forward to your input, Ga?l From nadavh at visionsense.com Wed Dec 24 12:06:56 2008 From: nadavh at visionsense.com (Nadav Horesh) Date: Wed, 24 Dec 2008 19:06:56 +0200 Subject: [Numpy-discussion] Is it possible with numpy? References: <945910.69974.qm@web26108.mail.ukl.yahoo.com> Message-ID: <710F2847B0018641891D9A216027636029C38D@ex3.envision.co.il> There is a (small) chance that sympy can help. Never the less you can use scipy.optimize to obtain a numerical solution, once you specify the right merit function. Nadav -----????? ??????----- ???: numpy-discussion-bounces at scipy.org ??? olfa mraihi ????: ? 24-?????-08 12:55 ??: numpy-discussion at scipy.org ????: [Numpy-discussion] Is it possible with numpy? Hello Numpy community, I want to know if?Numpy could deal with symbolic arrays and lists (by symbolic I mean without specifying the concrete contents of list or array) For example I want to solve a system of equations containing lists and arrays like this solve(x+Sum[A[k],k=i..N]==y+Sum[B[k],k=m..N], j-Length[C]==l-Length[D], ?z/(c?^ i)==t/(c?^ h), u+1==2*v-3w, v=f(f(w))) (here A and B are arrays; C?et D are lists; x,y,z,t,j,l,i,h,u,v,w are variables that could be of type integer or real, c is a constant and f is a function): ? Thank you very much. Yours faithfully, Olfa MRAIHI From alexwphoto at gmail.com Wed Dec 24 17:11:13 2008 From: alexwphoto at gmail.com (alexwphoto at gmail.com) Date: Wed, 24 Dec 2008 16:11:13 -0600 Subject: [Numpy-discussion] Specifying a dtype with RandomState? Message-ID: <1f7366a0812241411hba808c6l7bb634fbc1d47783@mail.gmail.com> I'm generating rather large matrices with a fixed random seed using rs = N.random.RandomState(123456789) U = rs.uniform(low=-0.1 high=self.0.1 size=(480189, 1000)).astype('float32') ... Several other arrays are instantiated as well. Because they are so large, I do all calculations on single-precision arrays. Coercing the output of rs.uniform() into a float32 requires an enormous copy operation (if I understand right). Since I am already hitting the upper limit of the memory space I have, it would be convenient if I could avoid the astype('float32') operation. Is there a way to have a RandomState object output single-precision floats? Thanks, Alex W -------------- next part -------------- An HTML attachment was scrubbed... URL: From bradford.n.cross at gmail.com Thu Dec 25 06:51:57 2008 From: bradford.n.cross at gmail.com (Bradford Cross) Date: Thu, 25 Dec 2008 12:51:57 +0100 Subject: [Numpy-discussion] new incremental statistics project In-Reply-To: <88e473830812190553lb6bf5e1s3eafcbe80d32f3cb@mail.gmail.com> References: <88e473830812190553lb6bf5e1s3eafcbe80d32f3cb@mail.gmail.com> Message-ID: I did not know about this - very cool! I think I was asking around the numpy/scipy lists a while back but nobody mentioned this; is it new? A couple of questions inline below. On Fri, Dec 19, 2008 at 2:53 PM, John Hunter wrote: > On Thu, Dec 18, 2008 at 8:27 PM, Bradford Cross > wrote: > > This is a new project I just released. > > > > I know it is C#, but some of the design and idioms would be nice in > > numpy/scipy for working with discrete event simulators, time series, and > > event stream processing. > > > > http://code.google.com/p/incremental-statistics/ > > I think an incremental stats module would be a boon to numpy or scipy. > Eric Firing has a nice module wrtten in C with a pyrex wrapper > (ringbuf) Please excuse my ignorance - what is the performance overhead of calling C via the pyrex wrapper? A lot of use cases for incremental statistics are discrete event systems where the calculations will be updated millions or billions of times; this was a concern I had about doing the project in C and calling across a wrapper. Maybe it was one of those entirely speculative and unfounded concerns. :-) > that does trailing incremental mean, median, std, min, max, > and percentile. It maintains a sorted queue to do the last three > efficiently, and handles NaN inputs. Not sure if our results hold universally or even asymptoticly, but we found that our implimention of order/rank statistics was faster when we backed it with partition selection algorithms operating on an array-based queue as opposed to our implimentaion of a sorted dequeue backed by a circular buffer. How does it handle NaN inputs exactly - does it just guard against them? That is the approach we took as well. We have a calculation guard that filters for both NaN and infinite values. > I would like to see this > extended to include exponential or other weightings to do things like > incremental trailing exponential moving averages and variances. This is a cool idea that I hadn't thought of. We do have exponentially weighted mean, but ideally one could supply a weighting function to any statistic. We've been moving toward a more functional combinator style library design lately and this is anothr step in that direction. > I > don't know what the licensing terms are of this module, but it might > be a good starting point for an incremental numpy stats module, at > least if you were thinking about supporting a finite lookback window. Yes, it sound great! If you read the docs here: http://code.google.com/p/incremental-statistics/ you can see that are have taken care to build the library from the beginning for static, accumulating, and rolling cases. The rolling case is what you are refering to as a finite lookback window, whereas accumualting as an accumulating lookback window, and the static case is the typical "compute hte mean of the entire sieries of observations at once" case. IMO, it turns out really nice when you think this way from the begnning becasue you get a lot of code reuse and nice oppertunities for composition. > > We have a copy of this in the py4science examples dir if you want to > take a look: > > svn co > https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/pyrex/trailstats > cd trailstats/ > make > python movavg_ringbuf.py > > Other things that would be very useful are incremental covariance and > regression. Indeed. We have a bit on the dependence statistics side, but not much. Incremental dependence and regression are the two hot items on the backlog. :-) > > > JDH > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Dec 26 00:47:08 2008 From: cournape at gmail.com (David Cournapeau) Date: Fri, 26 Dec 2008 14:47:08 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris wrote: > > > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau > wrote: >> >> David Cournapeau wrote: >> > Charles R Harris wrote: >> > >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau > >> > wrote: >> >> >> >> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern >> >> > wrote: >> >> >> >> > >> >> > I think he meant that it can be discovered at runtime in >> >> general, not >> >> > at numpy-run-time, so we can write a small C program that can be >> >> run >> >> > at numpy-build-time to add another entry to config.h. >> >> >> >> But then we only move the problem: people who want to build >> >> universal >> >> numpy extensions will have the wrong value, no ? The fundamental >> >> point >> >> of my patch is that the value is set whenever ndarrayobject.h is >> >> included. So even if I build numpy on PPC, NPY_BIGENDIAN will not >> >> be >> >> defined when the header is included for a file build with gcc -arch >> >> i386. >> >> >> >> >> >> We can probably set things up so the determination is at run time -- >> >> but we need to be sure that the ABI isn't affected. I did that once >> >> for an old project that needed data portability. In any case, it >> >> sounds like a project for a later release. >> >> >> > >> > It cannot work for numpy without breaking backward compatibility, >> > because of the following lines: >> > >> >> Actually, you could, by making the macro point to actual functions, but >> that would add function call cost. I don't know if the function call >> cost is significant or not in the cases where those macro are used, > > Exactly. Function calls are pretty cheap on modern hardware with good > compilers, nor would I expect the calls to be the bottleneck in most > applications. The functions would need to be visible to third party > applications, however... Would it be a problem ? Adding "true" functions to the array api, while keeping the macro for backward compatibility should be ok, no ? I also updated my patch, with another function PyArray_GetEndianness which detects the runtime endianness (using an union int/char[4]). The point is to detect any mismatch between the configuration endianness and the "true" one, and I put the detection in import_array. The function is in the numpy array API, but it does not really need to be either . cheers, David From charlesr.harris at gmail.com Fri Dec 26 02:05:11 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 26 Dec 2008 00:05:11 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> Message-ID: On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau wrote: > On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris > wrote: > > > > > > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau > > wrote: > >> > >> David Cournapeau wrote: > >> > Charles R Harris wrote: > >> > > >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau < > cournape at gmail.com > >> >> > wrote: > >> >> > >> >> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern > >> >> > wrote: > >> >> > >> >> > > >> >> > I think he meant that it can be discovered at runtime in > >> >> general, not > >> >> > at numpy-run-time, so we can write a small C program that can > be > >> >> run > >> >> > at numpy-build-time to add another entry to config.h. > >> >> > >> >> But then we only move the problem: people who want to build > >> >> universal > >> >> numpy extensions will have the wrong value, no ? The fundamental > >> >> point > >> >> of my patch is that the value is set whenever ndarrayobject.h is > >> >> included. So even if I build numpy on PPC, NPY_BIGENDIAN will not > >> >> be > >> >> defined when the header is included for a file build with gcc > -arch > >> >> i386. > >> >> > >> >> > >> >> We can probably set things up so the determination is at run time -- > >> >> but we need to be sure that the ABI isn't affected. I did that once > >> >> for an old project that needed data portability. In any case, it > >> >> sounds like a project for a later release. > >> >> > >> > > >> > It cannot work for numpy without breaking backward compatibility, > >> > because of the following lines: > >> > > >> > >> Actually, you could, by making the macro point to actual functions, but > >> that would add function call cost. I don't know if the function call > >> cost is significant or not in the cases where those macro are used, > > > > Exactly. Function calls are pretty cheap on modern hardware with good > > compilers, nor would I expect the calls to be the bottleneck in most > > applications. The functions would need to be visible to third party > > applications, however... > > Would it be a problem ? Adding "true" functions to the array api, > while keeping the macro for backward compatibility should be ok, no ? > I don't think it's a problem, just that the macros generate code that is compiled, so they need to call an api function. A decent compiler will probably load the function pointer somewhere fast if it is called in a loop, a const keyword somewhere will help with that. We might want something more convenient for our own code. > > I also updated my patch, with another function PyArray_GetEndianness > which detects the runtime endianness (using an union int/char[4]). The > point is to detect any mismatch between the configuration endianness > and the "true" one, and I put the detection in import_array. The > function is in the numpy array API, but it does not really need to be > either . > That sounds like a good start. It might be a good idea to use something like npy_int32 instead of a plain old integer. Likewise, it would probably be good to define the union as an anonymous constant. Hmm... something like: #include const union { int i; char c[4]; } order = {1}; const i = 1; int main(int argc, char **argv) { if (order.c[0]) printf("little endian\n"); else printf("big endian\n"); if (*(char*)&i) printf("little endian\n"); else printf("big endian\n"); return 0; } I've done it two ways here. They both require the -fno-strict-aliasing flag in gcc, but numpy is compiled with that flag. Both methods generate the same assembly with -O2 on my intel core2. ... cmpb $0, order ... cmpb $0, i Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Dec 26 02:17:16 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 26 Dec 2008 00:17:16 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> Message-ID: On Fri, Dec 26, 2008 at 12:05 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau wrote: > >> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris >> wrote: >> > >> > >> > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau >> > wrote: >> >> >> >> David Cournapeau wrote: >> >> > Charles R Harris wrote: >> >> > >> >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau < >> cournape at gmail.com >> >> >> > wrote: >> >> >> >> >> >> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern >> >> >> > wrote: >> >> >> >> >> >> > >> >> >> > I think he meant that it can be discovered at runtime in >> >> >> general, not >> >> >> > at numpy-run-time, so we can write a small C program that can >> be >> >> >> run >> >> >> > at numpy-build-time to add another entry to config.h. >> >> >> >> >> >> But then we only move the problem: people who want to build >> >> >> universal >> >> >> numpy extensions will have the wrong value, no ? The fundamental >> >> >> point >> >> >> of my patch is that the value is set whenever ndarrayobject.h is >> >> >> included. So even if I build numpy on PPC, NPY_BIGENDIAN will >> not >> >> >> be >> >> >> defined when the header is included for a file build with gcc >> -arch >> >> >> i386. >> >> >> >> >> >> >> >> >> We can probably set things up so the determination is at run time -- >> >> >> but we need to be sure that the ABI isn't affected. I did that once >> >> >> for an old project that needed data portability. In any case, it >> >> >> sounds like a project for a later release. >> >> >> >> >> > >> >> > It cannot work for numpy without breaking backward compatibility, >> >> > because of the following lines: >> >> > >> >> >> >> Actually, you could, by making the macro point to actual functions, but >> >> that would add function call cost. I don't know if the function call >> >> cost is significant or not in the cases where those macro are used, >> > >> > Exactly. Function calls are pretty cheap on modern hardware with good >> > compilers, nor would I expect the calls to be the bottleneck in most >> > applications. The functions would need to be visible to third party >> > applications, however... >> >> Would it be a problem ? Adding "true" functions to the array api, >> while keeping the macro for backward compatibility should be ok, no ? >> > > I don't think it's a problem, just that the macros generate code that is > compiled, so they need to call an api function. A decent compiler will > probably load the function pointer somewhere fast if it is called in a loop, > a const keyword somewhere will help with that. We might want something more > convenient for our own code. > > >> >> I also updated my patch, with another function PyArray_GetEndianness >> which detects the runtime endianness (using an union int/char[4]). The >> point is to detect any mismatch between the configuration endianness >> and the "true" one, and I put the detection in import_array. The >> function is in the numpy array API, but it does not really need to be >> either . >> > > That sounds like a good start. It might be a good idea to use something > like npy_int32 instead of a plain old integer. Likewise, it would probably > be good to define the union as an anonymous constant. Hmm... > something like: > > #include > > const union { > int i; > char c[4]; > } order = {1}; > > const i = 1; > > int main(int argc, char **argv) > { > if (order.c[0]) > printf("little endian\n"); > else > printf("big endian\n"); > > if (*(char*)&i) > printf("little endian\n"); > else > printf("big endian\n"); > > return 0; > } > > I've done it two ways here. They both require the -fno-strict-aliasing flag > in gcc, but numpy is compiled with that flag. Both methods generate the same > assembly with -O2 on my intel core2. > > ... > cmpb $0, order > ... > cmpb $0, i > I suppose we could also mark one of those variables as static, make the name more unique, and stick it in the include file, thus avoiding the need to add anything to the api. Not the cleanest solution, but maybe better... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Dec 26 04:20:26 2008 From: cournape at gmail.com (David Cournapeau) Date: Fri, 26 Dec 2008 18:20:26 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495073F9.3070401@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> Message-ID: <5b8d13220812260120u7b6667a2m1a7e6b6e470aaecc@mail.gmail.com> On Fri, Dec 26, 2008 at 4:05 PM, Charles R Harris wrote: > > > On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau > wrote: >> >> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris >> wrote: >> > >> > >> > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau >> > wrote: >> >> >> >> David Cournapeau wrote: >> >> > Charles R Harris wrote: >> >> > >> >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau >> >> >> > >> >> > wrote: >> >> >> >> >> >> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern >> >> >> > wrote: >> >> >> >> >> >> > >> >> >> > I think he meant that it can be discovered at runtime in >> >> >> general, not >> >> >> > at numpy-run-time, so we can write a small C program that can >> >> >> be >> >> >> run >> >> >> > at numpy-build-time to add another entry to config.h. >> >> >> >> >> >> But then we only move the problem: people who want to build >> >> >> universal >> >> >> numpy extensions will have the wrong value, no ? The fundamental >> >> >> point >> >> >> of my patch is that the value is set whenever ndarrayobject.h is >> >> >> included. So even if I build numpy on PPC, NPY_BIGENDIAN will >> >> >> not >> >> >> be >> >> >> defined when the header is included for a file build with gcc >> >> >> -arch >> >> >> i386. >> >> >> >> >> >> >> >> >> We can probably set things up so the determination is at run time -- >> >> >> but we need to be sure that the ABI isn't affected. I did that once >> >> >> for an old project that needed data portability. In any case, it >> >> >> sounds like a project for a later release. >> >> >> >> >> > >> >> > It cannot work for numpy without breaking backward compatibility, >> >> > because of the following lines: >> >> > >> >> >> >> Actually, you could, by making the macro point to actual functions, but >> >> that would add function call cost. I don't know if the function call >> >> cost is significant or not in the cases where those macro are used, >> > >> > Exactly. Function calls are pretty cheap on modern hardware with good >> > compilers, nor would I expect the calls to be the bottleneck in most >> > applications. The functions would need to be visible to third party >> > applications, however... >> >> Would it be a problem ? Adding "true" functions to the array api, >> while keeping the macro for backward compatibility should be ok, no ? > > I don't think it's a problem, just that the macros generate code that is > compiled, so they need to call an api function. A decent compiler will > probably load the function pointer somewhere fast if it is called in a loop, > a const keyword somewhere will help with that. We might want something more > convenient for our own code. > >> >> I also updated my patch, with another function PyArray_GetEndianness >> which detects the runtime endianness (using an union int/char[4]). The >> point is to detect any mismatch between the configuration endianness >> and the "true" one, and I put the detection in import_array. The >> function is in the numpy array API, but it does not really need to be >> either . > > That sounds like a good start. It might be a good idea to use something like > npy_int32 instead of a plain old integer. Likewise, it would probably be > good to define the union as an anonymous constant. Hmm... > something like: What I did was a bit more heavyweight, because I added it as function, but the idea is similar: static int compute_endianness() { union { char c[4]; npy_uint32 i; } bint; int st; bint.i = 'ABCD'; st = bintstrcmp(bint.c, "ABCD"); if (st == 0) { return NPY_CPU_BIG; } st = bintstrcmp(bint.c, "DCBA"); if (st == 0) { return NPY_CPU_LITTLE; } return NPY_CPU_UNKNOWN_ENDIAN; } Now that I think about it, I don't know if setting an integer to 'ABCD' is legal C. I think it is, but I don't claim to be any kind of C expert. > > > I've done it two ways here. They both require the -fno-strict-aliasing flag > in gcc, but numpy is compiled with that flag. Both methods generate the same > assembly with -O2 on my intel core2. I don't need this flag in my case, and I don't think we should require it if we can avoid it. I also use strings comparison, not just the first character, because in theory, there can be middle endian, but I doubt this has any real use. In that case, the function bintstrcmp can be dropped. David From charlesr.harris at gmail.com Fri Dec 26 12:38:37 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 26 Dec 2008 10:38:37 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <5b8d13220812260120u7b6667a2m1a7e6b6e470aaecc@mail.gmail.com> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <3d375d730812222135w6d1e2a65s22755a548835805e@mail.gmail.com> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> <5b8d13220812260120u7b6667a2m1a7e6b6e470aaecc@mail.gmail.com> Message-ID: On Fri, Dec 26, 2008 at 2:20 AM, David Cournapeau wrote: > On Fri, Dec 26, 2008 at 4:05 PM, Charles R Harris > wrote: > > > > > > On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau > > wrote: > >> > >> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau > >> > wrote: > >> >> > >> >> David Cournapeau wrote: > >> >> > Charles R Harris wrote: > >> >> > > >> >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau > >> >> >> >> >> >> > wrote: > >> >> >> > >> >> >> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern > >> >> >> > wrote: > >> >> >> > >> >> >> > > >> >> >> > I think he meant that it can be discovered at runtime in > >> >> >> general, not > >> >> >> > at numpy-run-time, so we can write a small C program that > can > >> >> >> be > >> >> >> run > >> >> >> > at numpy-build-time to add another entry to config.h. > >> >> >> > >> >> >> But then we only move the problem: people who want to build > >> >> >> universal > >> >> >> numpy extensions will have the wrong value, no ? The > fundamental > >> >> >> point > >> >> >> of my patch is that the value is set whenever ndarrayobject.h > is > >> >> >> included. So even if I build numpy on PPC, NPY_BIGENDIAN will > >> >> >> not > >> >> >> be > >> >> >> defined when the header is included for a file build with gcc > >> >> >> -arch > >> >> >> i386. > >> >> >> > >> >> >> > >> >> >> We can probably set things up so the determination is at run time > -- > >> >> >> but we need to be sure that the ABI isn't affected. I did that > once > >> >> >> for an old project that needed data portability. In any case, it > >> >> >> sounds like a project for a later release. > >> >> >> > >> >> > > >> >> > It cannot work for numpy without breaking backward compatibility, > >> >> > because of the following lines: > >> >> > > >> >> > >> >> Actually, you could, by making the macro point to actual functions, > but > >> >> that would add function call cost. I don't know if the function call > >> >> cost is significant or not in the cases where those macro are used, > >> > > >> > Exactly. Function calls are pretty cheap on modern hardware with good > >> > compilers, nor would I expect the calls to be the bottleneck in most > >> > applications. The functions would need to be visible to third party > >> > applications, however... > >> > >> Would it be a problem ? Adding "true" functions to the array api, > >> while keeping the macro for backward compatibility should be ok, no ? > > > > I don't think it's a problem, just that the macros generate code that is > > compiled, so they need to call an api function. A decent compiler will > > probably load the function pointer somewhere fast if it is called in a > loop, > > a const keyword somewhere will help with that. We might want something > more > > convenient for our own code. > > > >> > >> I also updated my patch, with another function PyArray_GetEndianness > >> which detects the runtime endianness (using an union int/char[4]). The > >> point is to detect any mismatch between the configuration endianness > >> and the "true" one, and I put the detection in import_array. The > >> function is in the numpy array API, but it does not really need to be > >> either . > > > > That sounds like a good start. It might be a good idea to use something > like > > npy_int32 instead of a plain old integer. Likewise, it would probably be > > good to define the union as an anonymous constant. Hmm... > > something like: > > What I did was a bit more heavyweight, because I added it as function, > but the idea is similar: > > static int compute_endianness() > { > union { > char c[4]; > npy_uint32 i; > } bint; > int st; > > bint.i = 'ABCD'; > > st = bintstrcmp(bint.c, "ABCD"); > if (st == 0) { > return NPY_CPU_BIG; > } > st = bintstrcmp(bint.c, "DCBA"); > if (st == 0) { > return NPY_CPU_LITTLE; > } > return NPY_CPU_UNKNOWN_ENDIAN; > } > > Now that I think about it, I don't know if setting an integer to > 'ABCD' is legal C. I think it is, but I don't claim to be any kind of > C expert. Try const union { char c[4]; npy_uint32 i; } bint = {1,2,3,4}; And just compare the resulting value of bint.i to predefined values, i.e., 67305985 for big endian. The initializer is needed for the const union and initializes the first variable. The result will be more efficient as the union is initialized at compile time. > > > > > > > I've done it two ways here. They both require the -fno-strict-aliasing > flag > > in gcc, but numpy is compiled with that flag. Both methods generate the > same > > assembly with -O2 on my intel core2. > > I don't need this flag in my case, and I don't think we should require > it if we can avoid it. > You can't avoid it, there will be a compile time error if you are lucky or buggy code if you aren't. > I also use strings comparison, not just the first character, because > in theory, there can be middle endian, but I doubt this has any real > use. In that case, the function bintstrcmp can be dropped. > Yeah, VAX used to be middle endian for floats but I think you are safe these days. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Dec 26 13:00:47 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 26 Dec 2008 11:00:47 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> <5b8d13220812260120u7b6667a2m1a7e6b6e470aaecc@mail.gmail.com> Message-ID: On Fri, Dec 26, 2008 at 10:38 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Fri, Dec 26, 2008 at 2:20 AM, David Cournapeau wrote: > >> On Fri, Dec 26, 2008 at 4:05 PM, Charles R Harris >> wrote: >> > >> > >> > On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau >> > wrote: >> >> >> >> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris >> >> wrote: >> >> > >> >> > >> >> > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau >> >> > wrote: >> >> >> >> >> >> David Cournapeau wrote: >> >> >> > Charles R Harris wrote: >> >> >> > >> >> >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau >> >> >> >> > >> >> >> > wrote: >> >> >> >> >> >> >> >> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern >> >> >> >> > >> wrote: >> >> >> >> >> >> >> >> > >> >> >> >> > I think he meant that it can be discovered at runtime in >> >> >> >> general, not >> >> >> >> > at numpy-run-time, so we can write a small C program that >> can >> >> >> >> be >> >> >> >> run >> >> >> >> > at numpy-build-time to add another entry to config.h. >> >> >> >> >> >> >> >> But then we only move the problem: people who want to build >> >> >> >> universal >> >> >> >> numpy extensions will have the wrong value, no ? The >> fundamental >> >> >> >> point >> >> >> >> of my patch is that the value is set whenever ndarrayobject.h >> is >> >> >> >> included. So even if I build numpy on PPC, NPY_BIGENDIAN will >> >> >> >> not >> >> >> >> be >> >> >> >> defined when the header is included for a file build with gcc >> >> >> >> -arch >> >> >> >> i386. >> >> >> >> >> >> >> >> >> >> >> >> We can probably set things up so the determination is at run time >> -- >> >> >> >> but we need to be sure that the ABI isn't affected. I did that >> once >> >> >> >> for an old project that needed data portability. In any case, it >> >> >> >> sounds like a project for a later release. >> >> >> >> >> >> >> > >> >> >> > It cannot work for numpy without breaking backward compatibility, >> >> >> > because of the following lines: >> >> >> > >> >> >> >> >> >> Actually, you could, by making the macro point to actual functions, >> but >> >> >> that would add function call cost. I don't know if the function call >> >> >> cost is significant or not in the cases where those macro are used, >> >> > >> >> > Exactly. Function calls are pretty cheap on modern hardware with good >> >> > compilers, nor would I expect the calls to be the bottleneck in most >> >> > applications. The functions would need to be visible to third party >> >> > applications, however... >> >> >> >> Would it be a problem ? Adding "true" functions to the array api, >> >> while keeping the macro for backward compatibility should be ok, no ? >> > >> > I don't think it's a problem, just that the macros generate code that is >> > compiled, so they need to call an api function. A decent compiler will >> > probably load the function pointer somewhere fast if it is called in a >> loop, >> > a const keyword somewhere will help with that. We might want something >> more >> > convenient for our own code. >> > >> >> >> >> I also updated my patch, with another function PyArray_GetEndianness >> >> which detects the runtime endianness (using an union int/char[4]). The >> >> point is to detect any mismatch between the configuration endianness >> >> and the "true" one, and I put the detection in import_array. The >> >> function is in the numpy array API, but it does not really need to be >> >> either . >> > >> > That sounds like a good start. It might be a good idea to use something >> like >> > npy_int32 instead of a plain old integer. Likewise, it would probably be >> > good to define the union as an anonymous constant. Hmm... >> > something like: >> >> What I did was a bit more heavyweight, because I added it as function, >> but the idea is similar: >> >> static int compute_endianness() >> { >> union { >> char c[4]; >> npy_uint32 i; >> } bint; >> int st; >> >> bint.i = 'ABCD'; >> >> st = bintstrcmp(bint.c, "ABCD"); >> if (st == 0) { >> return NPY_CPU_BIG; >> } >> st = bintstrcmp(bint.c, "DCBA"); >> if (st == 0) { >> return NPY_CPU_LITTLE; >> } >> return NPY_CPU_UNKNOWN_ENDIAN; >> } >> >> Now that I think about it, I don't know if setting an integer to >> 'ABCD' is legal C. I think it is, but I don't claim to be any kind of >> C expert. > > > Try > > const union { > char c[4]; > npy_uint32 i; > } bint = {1,2,3,4}; > > And just compare the resulting value of bint.i to predefined values, i.e., > 67305985 for big endian. The > Make that little endian. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Dec 26 20:47:24 2008 From: cournape at gmail.com (David Cournapeau) Date: Sat, 27 Dec 2008 10:47:24 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <5b8d13220812222140t60653b7fi80b23351caa009d9@mail.gmail.com> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> <5b8d13220812260120u7b6667a2m1a7e6b6e470aaecc@mail.gmail.com> Message-ID: <5b8d13220812261747x6b9099a5yaf11d86d9cf6b18a@mail.gmail.com> On Sat, Dec 27, 2008 at 2:38 AM, Charles R Harris wrote: > > > On Fri, Dec 26, 2008 at 2:20 AM, David Cournapeau > wrote: >> >> On Fri, Dec 26, 2008 at 4:05 PM, Charles R Harris >> wrote: >> > >> > >> > On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau >> > wrote: >> >> >> >> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris >> >> wrote: >> >> > >> >> > >> >> > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau >> >> > wrote: >> >> >> >> >> >> David Cournapeau wrote: >> >> >> > Charles R Harris wrote: >> >> >> > >> >> >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau >> >> >> >> > >> >> >> > wrote: >> >> >> >> >> >> >> >> On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern >> >> >> >> > wrote: >> >> >> >> >> >> >> >> > >> >> >> >> > I think he meant that it can be discovered at runtime in >> >> >> >> general, not >> >> >> >> > at numpy-run-time, so we can write a small C program that >> >> >> >> can >> >> >> >> be >> >> >> >> run >> >> >> >> > at numpy-build-time to add another entry to config.h. >> >> >> >> >> >> >> >> But then we only move the problem: people who want to build >> >> >> >> universal >> >> >> >> numpy extensions will have the wrong value, no ? The >> >> >> >> fundamental >> >> >> >> point >> >> >> >> of my patch is that the value is set whenever ndarrayobject.h >> >> >> >> is >> >> >> >> included. So even if I build numpy on PPC, NPY_BIGENDIAN will >> >> >> >> not >> >> >> >> be >> >> >> >> defined when the header is included for a file build with gcc >> >> >> >> -arch >> >> >> >> i386. >> >> >> >> >> >> >> >> >> >> >> >> We can probably set things up so the determination is at run time >> >> >> >> -- >> >> >> >> but we need to be sure that the ABI isn't affected. I did that >> >> >> >> once >> >> >> >> for an old project that needed data portability. In any case, it >> >> >> >> sounds like a project for a later release. >> >> >> >> >> >> >> > >> >> >> > It cannot work for numpy without breaking backward compatibility, >> >> >> > because of the following lines: >> >> >> > >> >> >> >> >> >> Actually, you could, by making the macro point to actual functions, >> >> >> but >> >> >> that would add function call cost. I don't know if the function call >> >> >> cost is significant or not in the cases where those macro are used, >> >> > >> >> > Exactly. Function calls are pretty cheap on modern hardware with good >> >> > compilers, nor would I expect the calls to be the bottleneck in most >> >> > applications. The functions would need to be visible to third party >> >> > applications, however... >> >> >> >> Would it be a problem ? Adding "true" functions to the array api, >> >> while keeping the macro for backward compatibility should be ok, no ? >> > >> > I don't think it's a problem, just that the macros generate code that is >> > compiled, so they need to call an api function. A decent compiler will >> > probably load the function pointer somewhere fast if it is called in a >> > loop, >> > a const keyword somewhere will help with that. We might want something >> > more >> > convenient for our own code. >> > >> >> >> >> I also updated my patch, with another function PyArray_GetEndianness >> >> which detects the runtime endianness (using an union int/char[4]). The >> >> point is to detect any mismatch between the configuration endianness >> >> and the "true" one, and I put the detection in import_array. The >> >> function is in the numpy array API, but it does not really need to be >> >> either . >> > >> > That sounds like a good start. It might be a good idea to use something >> > like >> > npy_int32 instead of a plain old integer. Likewise, it would probably be >> > good to define the union as an anonymous constant. Hmm... >> > something like: >> >> What I did was a bit more heavyweight, because I added it as function, >> but the idea is similar: >> >> static int compute_endianness() >> { >> union { >> char c[4]; >> npy_uint32 i; >> } bint; >> int st; >> >> bint.i = 'ABCD'; >> >> st = bintstrcmp(bint.c, "ABCD"); >> if (st == 0) { >> return NPY_CPU_BIG; >> } >> st = bintstrcmp(bint.c, "DCBA"); >> if (st == 0) { >> return NPY_CPU_LITTLE; >> } >> return NPY_CPU_UNKNOWN_ENDIAN; >> } >> >> Now that I think about it, I don't know if setting an integer to >> 'ABCD' is legal C. I think it is, but I don't claim to be any kind of >> C expert. > > Try > > const union { > char c[4]; > npy_uint32 i; > } bint = {1,2,3,4}; > > And just compare the resulting value of bint.i to predefined values, i.e., > 67305985 for big endian. The initializer is needed for the const union and > initializes the first variable. The result will be more efficient as the > union is initialized at compile time. I can do that, indeed. > >> >> >> > >> > >> > I've done it two ways here. They both require the -fno-strict-aliasing >> > flag >> > in gcc, but numpy is compiled with that flag. Both methods generate the >> > same >> > assembly with -O2 on my intel core2. >> >> I don't need this flag in my case, and I don't think we should require >> it if we can avoid it. > > You can't avoid it, there will be a compile time error if you are lucky or > buggy code if you aren't. I don't understand why. The whole point of union in that case is to make it clear that type punning is ok, since by definition the two values start at the same address. If this broke aliasing, any union would. Here is what man gcc tells me under the -fstrict-aliasing: """ Allows the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as anbobject of a different type, unless the types are almost the same. For example, an "unsigned int" can alias an "int", but not a "void*" or ab"double". A character type may alias any other type. Pay special attention to code like this: union a_union { int i; double d; }; int f() { a_union t; t.d = 3.0; return t.i; } The practice of reading from a different union member than the one most recently written to (called ``type-punning'') is common. Even withb-fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above will work as expected. However, this code might not: int f() { a_union t; int* ip; t.d = 3.0; ip = &t.i; return *ip; } """ David From cournape at gmail.com Fri Dec 26 20:57:13 2008 From: cournape at gmail.com (David Cournapeau) Date: Sat, 27 Dec 2008 10:57:13 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <5b8d13220812261747x6b9099a5yaf11d86d9cf6b18a@mail.gmail.com> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> <5b8d13220812260120u7b6667a2m1a7e6b6e470aaecc@mail.gmail.com> <5b8d13220812261747x6b9099a5yaf11d86d9cf6b18a@mail.gmail.com> Message-ID: <5b8d13220812261757o3ff9f244t94bf9e6935907a61@mail.gmail.com> On Sat, Dec 27, 2008 at 10:47 AM, David Cournapeau wrote: > > I don't understand why. The whole point of union in that case is to > make it clear that type punning is ok, since by definition the two > values start at the same address. If this broke aliasing, any union > would. Here is what man gcc tells me under the -fstrict-aliasing: > hm, seems to be a gcc extension, actually, so you're right. We have to find another way, then. David From charlesr.harris at gmail.com Fri Dec 26 22:02:44 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 26 Dec 2008 20:02:44 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <5b8d13220812261757o3ff9f244t94bf9e6935907a61@mail.gmail.com> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <49508320.60804@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> <5b8d13220812260120u7b6667a2m1a7e6b6e470aaecc@mail.gmail.com> <5b8d13220812261747x6b9099a5yaf11d86d9cf6b18a@mail.gmail.com> <5b8d13220812261757o3ff9f244t94bf9e6935907a61@mail.gmail.com> Message-ID: On Fri, Dec 26, 2008 at 6:57 PM, David Cournapeau wrote: > On Sat, Dec 27, 2008 at 10:47 AM, David Cournapeau > wrote: > > > > > I don't understand why. The whole point of union in that case is to > > make it clear that type punning is ok, since by definition the two > > values start at the same address. If this broke aliasing, any union > > would. Here is what man gcc tells me under the -fstrict-aliasing: > > > > hm, seems to be a gcc extension, actually, so you're right. We have to > find another way, then. > Just use -fno-strict-aliasing, the linux kernel does, numpy does, it's just one of those things where the gnu language lawyers found a loop hole in the specification and made strict aliasing the default because it yields some optimization sugar. Google for torvalds and -fno-strict-aliasing and you might find an old rant on the subject. It can also sneak up on you because it only kicks in if you compile with optimization, say -O2, and the union won't help because the compiler is onto your tricks ;) I went through the whole mess myself trying to set random ieee floats directly from random ints. In that case the compiler version didn't issue a warning and I got some fantastic benchmarks because the code was completely optimized away. Unfortunately, the values were wrong. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Dec 26 22:37:24 2008 From: cournape at gmail.com (David Cournapeau) Date: Sat, 27 Dec 2008 12:37:24 +0900 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <495083E9.5020105@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> <5b8d13220812260120u7b6667a2m1a7e6b6e470aaecc@mail.gmail.com> <5b8d13220812261747x6b9099a5yaf11d86d9cf6b18a@mail.gmail.com> <5b8d13220812261757o3ff9f244t94bf9e6935907a61@mail.gmail.com> Message-ID: <5b8d13220812261937m10d2e432t2488747b9a1e83d0@mail.gmail.com> On Sat, Dec 27, 2008 at 12:02 PM, Charles R Harris wrote: > > > On Fri, Dec 26, 2008 at 6:57 PM, David Cournapeau > wrote: >> >> On Sat, Dec 27, 2008 at 10:47 AM, David Cournapeau >> wrote: >> >> > >> > I don't understand why. The whole point of union in that case is to >> > make it clear that type punning is ok, since by definition the two >> > values start at the same address. If this broke aliasing, any union >> > would. Here is what man gcc tells me under the -fstrict-aliasing: >> > >> >> hm, seems to be a gcc extension, actually, so you're right. We have to >> find another way, then. > > Just use -fno-strict-aliasing, the linux kernel does, numpy does, it's just > one of those things where the gnu language lawyers found a loop hole in the > specification and made strict aliasing the default because it yields some > optimization sugar. But we don't only use gcc, which is why I don't like relying on this -fno-strict-aliasing for the code to be correct: the code may well break on another compiler. Linux is different: for all purpose, it is only compilable by gcc (and used as a benchmark for being conformant to gcc, like intel compiler does). > Google for torvalds and -fno-strict-aliasing and you > might find an old rant on the subject. His rant is different, I think; he acknowledges union as usable for type-punning, but claims it is not useful for many cases. Since we only care about a very simple case here, that should do it. Again, according to gcc own man page, using union for type punning is valid as far as aliasing rules are concerned - at least for gcc. Although using an union for type-punning is undefined as far as the C99 standard goes, it looks like any compiler does implement the expected behavior: """ Strictly speaking, reading a member of a union different from the one written to is undefined in ANSI/ISO C99 except in the special case of type-punning to a char*, similar to the example below: Casting to char*. However, it is an extremely common idiom and is well-supported by all major compilers. As a practical matter, reading and writing to any member of a union, in any order, is acceptable practice. """ in http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html#union_1 I am wondering about the usefulness of union if accessing a different member than the one initialized is undefined (maybe to force alignment ?). > It can also sneak up on you because > it only kicks in if you compile with optimization, say -O2, and the union > won't help because the compiler is onto your tricks ;) Yes, I understand that if you broke aliasing rules, you can have undefined behavior. But it seems that using union does not break those: it is documented as such for gcc, and would be the case elsewhere; I checked the autoconf macro to test endianness: it uses union as well. David From charlesr.harris at gmail.com Fri Dec 26 23:37:16 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 26 Dec 2008 21:37:16 -0700 Subject: [Numpy-discussion] Request for review: dynamic_cpu_branch In-Reply-To: <5b8d13220812261937m10d2e432t2488747b9a1e83d0@mail.gmail.com> References: <49505C80.80709@ar.media.kyoto-u.ac.jp> <5b8d13220812252147if268492t99201dfcb5ab04d1@mail.gmail.com> <5b8d13220812260120u7b6667a2m1a7e6b6e470aaecc@mail.gmail.com> <5b8d13220812261747x6b9099a5yaf11d86d9cf6b18a@mail.gmail.com> <5b8d13220812261757o3ff9f244t94bf9e6935907a61@mail.gmail.com> <5b8d13220812261937m10d2e432t2488747b9a1e83d0@mail.gmail.com> Message-ID: On Fri, Dec 26, 2008 at 8:37 PM, David Cournapeau wrote: > On Sat, Dec 27, 2008 at 12:02 PM, Charles R Harris > wrote: > > > > > > On Fri, Dec 26, 2008 at 6:57 PM, David Cournapeau > > wrote: > >> > >> On Sat, Dec 27, 2008 at 10:47 AM, David Cournapeau > >> wrote: > >> > >> > > >> > I don't understand why. The whole point of union in that case is to > >> > make it clear that type punning is ok, since by definition the two > >> > values start at the same address. If this broke aliasing, any union > >> > would. Here is what man gcc tells me under the -fstrict-aliasing: > >> > > >> > >> hm, seems to be a gcc extension, actually, so you're right. We have to > >> find another way, then. > > > > Just use -fno-strict-aliasing, the linux kernel does, numpy does, it's > just > > one of those things where the gnu language lawyers found a loop hole in > the > > specification and made strict aliasing the default because it yields some > > optimization sugar. > > But we don't only use gcc, which is why I don't like relying on this > -fno-strict-aliasing for the code to be correct: the code may well > break on another compiler. Linux is different: for all purpose, it is > only compilable by gcc (and used as a benchmark for being conformant > to gcc, like intel compiler does). > Most compilers do the right thing, otherwise numpy wouldn't work. Gcc is special. > > > Google for torvalds and -fno-strict-aliasing and you > > might find an old rant on the subject. > > His rant is different, I think; he acknowledges union as usable for > type-punning, but claims it is not useful for many cases. Since we > only care about a very simple case here, that should do it. Again, > according to gcc own man page, using union for type punning is valid > as far as aliasing rules are concerned - at least for gcc. > > Although using an union for type-punning is undefined as far as the > C99 standard goes, it looks like any compiler does implement the > expected behavior: > > """ > Strictly speaking, reading a member of a union different from the one > written to is undefined in ANSI/ISO C99 except in the special case of > type-punning to a char*, similar to the example below: Casting to > char*. However, it is an extremely common idiom and is well-supported > by all major compilers. As a practical matter, reading and writing to > any member of a union, in any order, is acceptable practice. > """ > > in > http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html#union_1 > > I am wondering about the usefulness of union if accessing a different > member than the one initialized is undefined (maybe to force alignment > ?). > > > > It can also sneak up on you because > > it only kicks in if you compile with optimization, say -O2, and the union > > won't help because the compiler is onto your tricks ;) > > Yes, I understand that if you broke aliasing rules, you can have > undefined behavior. But it seems that using union does not break > those: it is documented as such for gcc, and would be the case > elsewhere; I checked the autoconf macro to test endianness: it uses > union as well. > I've gotten warnings and bad code using unions. That may have been a compiler bug or maybe it's been changed in recent versions of gcc, but I don't think you can count on it. Looks like gcc also likes another set of braces in the initializer. const union { char c[4]; int i; } order = {{1,2,3,4}}; Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bradford.n.cross at gmail.com Sat Dec 27 10:59:25 2008 From: bradford.n.cross at gmail.com (Bradford Cross) Date: Sat, 27 Dec 2008 16:59:25 +0100 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code In-Reply-To: <20081224132116.GB24856@phare.normalesup.org> References: <1229973108.5867.0.camel@linux-8ej9.site> <20081223003954.GF13171@phare.normalesup.org> <20081224132116.GB24856@phare.normalesup.org> Message-ID: I prototyped an approach last year that worked out well. I don't really know what to call it - maybe something like "property based persistence." It is kind of strange and I am not sure how broadly applicable it is - I have only used it for financial time series data. I'll try to explain how the idea works. I start with a python object that has a number of properties and an associated large data set (in my case, financial instruments and their associated time series in the form of numpy arrays.) I then created infrastructure that allowed me to define a simple "mapper" function that used a subset of the object's properties to define a "path" (expressible in the same form either as a file system path or as a path in HDF to a table.) Then I persisted the bulky data set (again, time series in my case) at that location. This little piece of infrastructure is very lightweight and cuts the client side persistence code down to only the small "mapper" functions. The mapper functions don't actually build up paths - they just specify the properties and ordering that you want to use to build up the paths. It also makes querying very simple and fast because you don't really query at all - instead the properties associated with the query directly express the path at which the data is located. The drawback of this simplistic approach is that you need to add a second level of path addressing if you deal with datasets so large that you can not really persist them under a single path. If you have single multi GB or TB arrays you probably want to chunk things up a bit more in the style of GFS and its open source counterparts. I still have the python code for this properties based time series database. It is a very small and simple peice of code, but I am happy to give it a quick polish and open source it if anyone is interested in taking a look. I am also about to try this model using F# and db4o for a .Net project. On Wed, Dec 24, 2008 at 2:21 PM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Tue, Dec 23, 2008 at 02:10:50AM +0100, Olivier Grisel wrote: > > Interesting topic indeed. I think I have been hit with similar > problems on > > toy experimental scripts. So far the solution was always adhoc FS > caches > > of numpy arrays with manual filename management. Maybe the first step > for > > designing a generic solution would be to list some representative yet > > simple enough use cases with real sample python code so as to focus on > > concrete matters and avoid over engineering a general solution for > > philosophical problems. > > Yes, that's clearly a first ste: list the usecases, and the way we would > like it solved: think about the API. > > My internet connection is quite random currently, and I'll probably loose > it for a week any time soon. Do you want to start such a page on the > wiki. Mark it as a sratch page, and we'll delete it later. > > I should point out that joblib (on PyPI and launchpad) was a first > attempt to solve this problem, so you could have a look at it. I have > already identified things that are wrong with joblib (more on the API > side than actual bugs), so I know it is not a final solution. Figuring > out what was wrong only came from using it heavily in my work. I thing > the only way forward it to start something, use it, figure out what's > wrong, and start again... > > Looking forward to your input, > > Ga?l > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sat Dec 27 12:33:55 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 27 Dec 2008 18:33:55 +0100 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code In-Reply-To: References: <1229973108.5867.0.camel@linux-8ej9.site> <20081223003954.GF13171@phare.normalesup.org> <20081224132116.GB24856@phare.normalesup.org> Message-ID: <20081227173355.GA15538@phare.normalesup.org> On Sat, Dec 27, 2008 at 04:59:25PM +0100, Bradford Cross wrote: > I prototyped an approach last year that worked out well. I don't really > know what to call it - maybe something like "property based persistence." > It is kind of strange and I am not sure how broadly applicable it is - I > have only used it for financial time series data. Yeay, that's exactly what I had in mind for my second try. I though I would call this special object some kind of execution context. > I still have the python code for this properties based time series > database. It is a very small and simple peice of code, but I am happy to > give it a quick polish and open source it if anyone is interested in > taking a look. I am very interested in both your code, and anything you can to tell us about what worked well, and what you would do different. > I am also about to try this model using F# and db4o for a .Net project. Functionally language are clearly a very interesting alley to go down for these problems. I am right now in Python, and staying there for a while, but I believe I can learn a lot from functionnal languages. Thanks for your feedback, Ga?l From len-l at telus.net Sat Dec 27 15:05:52 2008 From: len-l at telus.net (Lenard Lindstrom) Date: Sat, 27 Dec 2008 12:05:52 -0800 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows Message-ID: <49568AA0.8050108@telus.net> Hi everyone, I build the Pygame dependencies for Windows. With the next Pygame release, 1.9.0, we would like to include Python 2.6 support. As you already know, Pygame has NumPy bindings. Though NumPy is not required, it is a useful addition. I understand NumPy is built with MinGW on Windows, which I use to with Pygame and its dependencies. I know the problems linking against msvcr90.dll. I am willing to offer what advice I can to get NumPy up and running for Python 2.6. You are welcome to use the Pygame build tools if they will help. I also have a Python 2.6 build of NumPy 1.2.1 for Pygame testing. http://www3.telus.net/len_l/pygame/numpy-1.2.1.win32-py2.6.msi md5sum: b791f5c4b620da21f779b53252b5932e *numpy-1.2.1.win32-py2.6.msi Lenard -- Lenard Lindstrom From robert.kern at gmail.com Sat Dec 27 16:47:29 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 27 Dec 2008 16:47:29 -0500 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <49568AA0.8050108@telus.net> References: <49568AA0.8050108@telus.net> Message-ID: <3d375d730812271347i1670b201nf536d16da277636f@mail.gmail.com> On Sat, Dec 27, 2008 at 15:05, Lenard Lindstrom wrote: > Hi everyone, > > I build the Pygame dependencies for Windows. With the next Pygame > release, 1.9.0, we would like to include Python 2.6 support. As you > already know, Pygame has NumPy bindings. Though NumPy is not required, > it is a useful addition. I understand NumPy is built with MinGW on > Windows, which I use to with Pygame and its dependencies. I know the > problems linking against msvcr90.dll. I am willing to offer what advice > I can to get NumPy up and running for Python 2.6. Yes, please. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Sat Dec 27 21:10:52 2008 From: cournape at gmail.com (David Cournapeau) Date: Sun, 28 Dec 2008 11:10:52 +0900 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <49568AA0.8050108@telus.net> References: <49568AA0.8050108@telus.net> Message-ID: <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> Hi Lenard, On Sun, Dec 28, 2008 at 5:05 AM, Lenard Lindstrom wrote: > Hi everyone, > > I build the Pygame dependencies for Windows. With the next Pygame > release, 1.9.0, we would like to include Python 2.6 support. As you > already know, Pygame has NumPy bindings. Though NumPy is not required, > it is a useful addition. I understand NumPy is built with MinGW on > Windows, which I use to with Pygame and its dependencies. I know the > problems linking against msvcr90.dll. I am willing to offer what advice > I can to get NumPy up and running for Python 2.6. Thanks. I think I have covered most problems concerning python 2.6 and windows in the trunk (upcoming 1.3): - linking against msvcr90.dll - generating manifest for running code snippets (with mingw) - fix some bugs with python 2.6 msvc support (in particular http://bugs.python.org/issue4702) You are welcome to test the trunk to see if that fixes everything. I don't think everything can be fixed for 1.2.2, because the changes are not all trivial (much revamp C99 math support, in particular). Unfortunately, I have been working on some formatting issues which were more difficult than previously thought, and it was time to go to sleep before I actually fixed the problem, so the trunk may be broken ATM. I will fix this now, cheers, David From len-l at telus.net Sat Dec 27 21:55:15 2008 From: len-l at telus.net (Lenard Lindstrom) Date: Sat, 27 Dec 2008 18:55:15 -0800 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> References: <49568AA0.8050108@telus.net> <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> Message-ID: <4956EA93.2000804@telus.net> David Cournapeau wrote: > Hi Lenard, > > > On Sun, Dec 28, 2008 at 5:05 AM, Lenard Lindstrom wrote: > >> Hi everyone, >> >> I build the Pygame dependencies for Windows. With the next Pygame >> release, 1.9.0, we would like to include Python 2.6 support. As you >> already know, Pygame has NumPy bindings. Though NumPy is not required, >> it is a useful addition. I understand NumPy is built with MinGW on >> Windows, which I use to with Pygame and its dependencies. I know the >> problems linking against msvcr90.dll. I am willing to offer what advice >> I can to get NumPy up and running for Python 2.6. >> > > Thanks. I think I have covered most problems concerning python 2.6 and > windows in the trunk (upcoming 1.3): > > - linking against msvcr90.dll > - generating manifest for running code snippets (with mingw) > - fix some bugs with python 2.6 msvc support (in particular > http://bugs.python.org/issue4702) > > You are welcome to test the trunk to see if that fixes everything. I > don't think everything can be fixed for 1.2.2, because the changes are > not all trivial (much revamp C99 math support, in particular). > > Unfortunately, I have been working on some formatting issues which > were more difficult than previously thought, and it was time to go to > sleep before I actually fixed the problem, so the trunk may be broken > ATM. I will fix this now, > > It looks like you have a handle on the problem. How did you get around the problems with the incomplete libmsvcr90.a import library? I have custom import libraries which you can use if needed. Lenard -- Lenard Lindstrom From david at ar.media.kyoto-u.ac.jp Sat Dec 27 21:58:55 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 28 Dec 2008 11:58:55 +0900 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <4956EA93.2000804@telus.net> References: <49568AA0.8050108@telus.net> <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> <4956EA93.2000804@telus.net> Message-ID: <4956EB6F.1070908@ar.media.kyoto-u.ac.jp> Lenard Lindstrom wrote: > David Cournapeau wrote: >> Hi Lenard, >> >> >> On Sun, Dec 28, 2008 at 5:05 AM, Lenard Lindstrom wrote: >> >>> Hi everyone, >>> >>> I build the Pygame dependencies for Windows. With the next Pygame >>> release, 1.9.0, we would like to include Python 2.6 support. As you >>> already know, Pygame has NumPy bindings. Though NumPy is not required, >>> it is a useful addition. I understand NumPy is built with MinGW on >>> Windows, which I use to with Pygame and its dependencies. I know the >>> problems linking against msvcr90.dll. I am willing to offer what advice >>> I can to get NumPy up and running for Python 2.6. >>> >> Thanks. I think I have covered most problems concerning python 2.6 and >> windows in the trunk (upcoming 1.3): >> >> - linking against msvcr90.dll >> - generating manifest for running code snippets (with mingw) >> - fix some bugs with python 2.6 msvc support (in particular >> http://bugs.python.org/issue4702) >> >> You are welcome to test the trunk to see if that fixes everything. I >> don't think everything can be fixed for 1.2.2, because the changes are >> not all trivial (much revamp C99 math support, in particular). >> >> Unfortunately, I have been working on some formatting issues which >> were more difficult than previously thought, and it was time to go to >> sleep before I actually fixed the problem, so the trunk may be broken >> ATM. I will fix this now, >> >> > It looks like you have a handle on the problem. How did you get around > the problems with the incomplete libmsvcr90.a import library? I have > custom import libraries which you can use if needed. Do you mean on xp 32 bits or 64 bits ? For the later, I have yet to submit patchs to the mingw-w64 project - the whole libmsvcr90.a is missing, actually. For 32 bits, I simply got around it by changing the missing functions in numpy itself - if we are talking about the same thing, that is missing time functions for random. You can look at revisions r6080, r6076, r6074, r6073, r6072,r6070,r6069,r6029,r6028, the final patch is as follow: diff --git a/numpy/random/mtrand/randomkit.c b/numpy/random/mtrand/randomkit.c index 56f52c0..0fbc40d 100644 --- a/numpy/random/mtrand/randomkit.c +++ b/numpy/random/mtrand/randomkit.c @@ -64,18 +64,33 @@ /* static char const rcsid[] = "@(#) $Jeannot: randomkit.c,v 1.28 2005/07/21 22:14:09 js Exp $"; */ - #include #include #include #include -#include #include #include #ifdef _WIN32 /* Windows */ +/* XXX: we have to use this ugly defined(__GNUC__) because it is not easy to + * detect the compiler used in distutils itself */ +#if (defined(__GNUC__) && defined(NPY_NEEDS_MINGW_TIME_WORKAROUND)) +/* FIXME: ideally, we should set this to the real version of MSVCRT. We need + * something higher than 0x601 to enable _ftime64 and co */ +#define __MSVCRT_VERSION__ 0x0700 +#include #include +/* mingw msvcr lib import wrongly export _ftime, which does not exist in the + * actual msvc runtime for version >= 8; we make it an alias to _ftime64, which + * is available in those versions of the runtime + */ +#define _FTIME(x) _ftime64((x)) +#else +#include +#include +#define _FTIME(x) _ftime((x)) +#endif #ifndef RK_NO_WINCRYPT /* Windows crypto */ #ifndef _WIN32_WINNT @@ -86,6 +101,7 @@ #endif #else /* Unix */ +#include #include #include #endif @@ -167,7 +183,7 @@ rk_error rk_randomseed(rk_state *state) rk_seed(rk_hash(getpid()) ^ rk_hash(tv.tv_sec) ^ rk_hash(tv.tv_usec) ^ rk_hash(clock()), state); #else - _ftime(&tv); + _FTIME(&tv); rk_seed(rk_hash(tv.time) ^ rk_hash(tv.millitm) ^ rk_hash(clock()), state); #endif diff --git a/numpy/random/setup.py b/numpy/random/setup.py index e7955db..dde3119 100644 --- a/numpy/random/setup.py +++ b/numpy/random/setup.py @@ -1,13 +1,19 @@ -from os.path import join, split +from os.path import join, split, dirname +import os import sys +from distutils.dep_util import newer +from distutils.msvccompiler import get_build_version as get_msvc_build_version -def msvc_version(): - """Return the msvc version used to build the running python, None if not - built with MSVC.""" - msc_pos = sys.version.find('MSC v.') - if msc_pos != -1: - return sys.version[msc_pos+6:msc_pos+10] - return None +def needs_mingw_ftime_workaround(): + # We need the mingw workaround for _ftime if the msvc runtime version is + # 7.1 or above and we build with mingw ... + # ... but we can't easily detect compiler version outside distutils command + # context, so we will need to detect in randomkit whether we build with gcc + msver = get_msvc_build_version() + if msver and msver >= 8: + return True + + return False def configuration(parent_package='',top_path=None): from numpy.distutils.misc_util import Configuration, get_mathlibs @@ -22,6 +28,10 @@ def configuration(parent_package='',top_path=None): ext.libraries.extend(libs) return None + defs = [] + if needs_mingw_ftime_workaround(): + defs.append(("NPY_NEEDS_MINGW_TIME_WORKAROUND", None)) + libs = [] # Configure mtrand config.add_extension('mtrand', @@ -32,7 +42,8 @@ def configuration(parent_package='',top_path=None): depends = [join('mtrand','*.h'), join('mtrand','*.pyx'), join('mtrand','*.pxi'), - ] + ], + define_macros = defs, ) config.add_data_files(('.', join('mtrand', 'randomkit.h'))) David From david at ar.media.kyoto-u.ac.jp Sat Dec 27 22:04:23 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 28 Dec 2008 12:04:23 +0900 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> References: <49568AA0.8050108@telus.net> <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> Message-ID: <4956ECB7.5050300@ar.media.kyoto-u.ac.jp> David Cournapeau wrote: > Hi Lenard, > > > On Sun, Dec 28, 2008 at 5:05 AM, Lenard Lindstrom wrote: > >> Hi everyone, >> >> I build the Pygame dependencies for Windows. With the next Pygame >> release, 1.9.0, we would like to include Python 2.6 support. As you >> already know, Pygame has NumPy bindings. Though NumPy is not required, >> it is a useful addition. I understand NumPy is built with MinGW on >> Windows, which I use to with Pygame and its dependencies. I know the >> problems linking against msvcr90.dll. I am willing to offer what advice >> I can to get NumPy up and running for Python 2.6. >> > > Thanks. I think I have covered most problems concerning python 2.6 and > windows in the trunk (upcoming 1.3): > > - linking against msvcr90.dll > - generating manifest for running code snippets (with mingw) > - fix some bugs with python 2.6 msvc support (in particular > http://bugs.python.org/issue4702) > > You are welcome to test the trunk to see if that fixes everything. I > don't think everything can be fixed for 1.2.2, because the changes are > not all trivial (much revamp C99 math support, in particular). > > Unfortunately, I have been working on some formatting issues which > were more difficult than previously thought, and it was time to go to > sleep before I actually fixed the problem, so the trunk may be broken > ATM. I will fix this now, I have reverted the buggy changes, so the trunk should be usable again, and contain all the fixes so far for mingw + python 2.6 support (including mingw-w64). cheers, David From len-l at telus.net Sat Dec 27 22:50:45 2008 From: len-l at telus.net (Lenard Lindstrom) Date: Sat, 27 Dec 2008 19:50:45 -0800 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <4956EB6F.1070908@ar.media.kyoto-u.ac.jp> References: <49568AA0.8050108@telus.net> <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> <4956EA93.2000804@telus.net> <4956EB6F.1070908@ar.media.kyoto-u.ac.jp> Message-ID: <4956F795.30309@telus.net> David Cournapeau wrote: > Lenard Lindstrom wrote: > >> David Cournapeau wrote: >> >>> Hi Lenard, >>> >>> >>> On Sun, Dec 28, 2008 at 5:05 AM, Lenard Lindstrom wrote: >>> >>> >>>> Hi everyone, >>>> >>>> [snip] >>>> I am willing to offer what advice >>>> I can to get NumPy up and running for Python 2.6. >>>> >>>> >>> Thanks. I think I have covered most problems concerning python 2.6 and >>> windows in the trunk (upcoming 1.3)[.] >>> [snip] >>> >>> >>> >>> >> It looks like you have a handle on the problem. How did you get around >> the problems with the incomplete libmsvcr90.a import library? I have >> custom import libraries which you can use if needed. >> > > Do you mean on xp 32 bits or 64 bits ? For the later, I have yet to > submit patchs to the mingw-w64 project - the whole libmsvcr90.a is > missing, actually. For 32 bits, I simply got around it by changing the > missing functions in numpy itself - if we are talking about the same > thing, that is missing time functions for random. Yes, the _ftime function, which is an inlined function in VC 2008 that calls _ftime64. I have to build a lot of dependencies for Pygame so I want to avoid patching code when possible. Instead I have a custom libmsvcr90.a that has stub functions for the various time functions. It lets me create static libraries that link to both msvcr71.dll and msvcr90.dll. No manifest files required. And no patches to MinGW. -- Lenard Lindstrom From david at ar.media.kyoto-u.ac.jp Sat Dec 27 22:48:07 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 28 Dec 2008 12:48:07 +0900 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <4956F795.30309@telus.net> References: <49568AA0.8050108@telus.net> <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> <4956EA93.2000804@telus.net> <4956EB6F.1070908@ar.media.kyoto-u.ac.jp> <4956F795.30309@telus.net> Message-ID: <4956F6F7.5030605@ar.media.kyoto-u.ac.jp> Lenard Lindstrom wrote: > David Cournapeau wrote: > >> Lenard Lindstrom wrote: >> >> >>> David Cournapeau wrote: >>> >>> >>>> Hi Lenard, >>>> >>>> >>>> On Sun, Dec 28, 2008 at 5:05 AM, Lenard Lindstrom wrote: >>>> >>>> >>>> >>>>> Hi everyone, >>>>> >>>>> >>>>> > [snip] > >>>>> I am willing to offer what advice >>>>> I can to get NumPy up and running for Python 2.6. >>>>> >>>>> >>>>> >>>> Thanks. I think I have covered most problems concerning python 2.6 and >>>> windows in the trunk (upcoming 1.3)[.] >>>> >>>> > [snip] > >>>> >>>> >>>> >>>> >>>> >>> It looks like you have a handle on the problem. How did you get around >>> the problems with the incomplete libmsvcr90.a import library? I have >>> custom import libraries which you can use if needed. >>> >>> >> Do you mean on xp 32 bits or 64 bits ? For the later, I have yet to >> submit patchs to the mingw-w64 project - the whole libmsvcr90.a is >> missing, actually. For 32 bits, I simply got around it by changing the >> missing functions in numpy itself - if we are talking about the same >> thing, that is missing time functions for random. >> > Yes, the _ftime function, which is an inlined function in VC 2008 that > calls _ftime64. I have to build a lot of dependencies for Pygame so I > want to avoid patching code when possible. I understand you don't want to patch the sources. The above fix is in the trunk, though - and I don't feel like backporting those fixes in the 1.2.x branch, because it would be a lot of work. > Instead I have a custom > libmsvcr90.a that has stub functions for the various time functions. It > lets me create static libraries that link to both msvcr71.dll and > msvcr90.dll. No manifest files required. And no patches to MinGW. > Manifests are needed for any executable linking against msvcr90.dll, whether you build with mingw or VS: this is required by windows itself to be able to load msvcr90.dll at all (the dreadful Side by Side assembly stuff). This is a totally independent issue of the _ftime thing, and AFAIK, there is no way around it - except installing msvcrt90.dll in system32 yourself, which is obviously a very bad idea. Patching mingw is necessary for 64 bits support, since their headers are missing some math functions - no patch is needed for 32 bits. cheers, David From len-l at telus.net Sat Dec 27 23:06:57 2008 From: len-l at telus.net (Lenard Lindstrom) Date: Sat, 27 Dec 2008 20:06:57 -0800 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <4956F795.30309@telus.net> References: <49568AA0.8050108@telus.net> <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> <4956EA93.2000804@telus.net> <4956EB6F.1070908@ar.media.kyoto-u.ac.jp> <4956F795.30309@telus.net> Message-ID: <4956FB61.7010707@telus.net> Lenard Lindstrom wrote: > David Cournapeau wrote: > >> Do you mean on xp 32 bits or 64 bits ? For the later, I have yet to >> submit patchs to the mingw-w64 project - the whole libmsvcr90.a is >> missing, actually. For 32 bits, I simply got around it by changing the >> missing functions in numpy itself - if we are talking about the same >> thing, that is missing time functions for random. >> > Yes, the _ftime function, which is an inlined function in VC 2008 that > calls _ftime64. I have to build a lot of dependencies for Pygame so I > want to avoid patching code when possible. Instead I have a custom > libmsvcr90.a that has stub functions for the various time functions. It > lets me create static libraries that link to both msvcr71.dll and > msvcr90.dll. No manifest files required. And no patches to MinGW. > > > It just occurred to me: -D_ftime=_ftime64. I will have to see if this works with gmtime in the png library. Thanks for the advice. Lenard -- Lenard Lindstrom From len-l at telus.net Sat Dec 27 23:34:11 2008 From: len-l at telus.net (Lenard Lindstrom) Date: Sat, 27 Dec 2008 20:34:11 -0800 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <4956F6F7.5030605@ar.media.kyoto-u.ac.jp> References: <49568AA0.8050108@telus.net> <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> <4956EA93.2000804@telus.net> <4956EB6F.1070908@ar.media.kyoto-u.ac.jp> <4956F795.30309@telus.net> <4956F6F7.5030605@ar.media.kyoto-u.ac.jp> Message-ID: <495701C3.8080804@telus.net> David Cournapeau wrote: > Lenard Lindstrom wrote: > >> David Cournapeau wrote: >> >> >>> >>> Do you mean on xp 32 bits or 64 bits ? For the later, I have yet to >>> submit patchs to the mingw-w64 project - the whole libmsvcr90.a is >>> missing, actually. For 32 bits, I simply got around it by changing the >>> missing functions in numpy itself - if we are talking about the same >>> thing, that is missing time functions for random. >>> >>> >> Yes, the _ftime function, which is an inlined function in VC 2008 that >> calls _ftime64. I have to build a lot of dependencies for Pygame so I >> want to avoid patching code when possible. >> > > I understand you don't want to patch the sources. The above fix is in > the trunk, though - and I don't feel like backporting those fixes in the > 1.2.x branch, because it would be a lot of work. > > Sorry for the confusion. I meant that I don't like patching SDL and such. I build these in msys using the "configure; make install" incantation, so can't easily use magic like manifest files. Instead I link the DLLs against msvcr71.dll, making sure there are also static libraries, then create the msvcr90.dll linked DLL's from the static libraries. This trick can also be used with the NumPy dependencies Blas and fftw. Actually it is easier, since they are statically linked into NumPy. >> Instead I have a custom >> libmsvcr90.a that has stub functions for the various time functions. It >> lets me create static libraries that link to both msvcr71.dll and >> msvcr90.dll. No manifest files required. And no patches to MinGW. >> >> > > Manifests are needed for any executable linking against msvcr90.dll, > whether you build with mingw or VS: this is required by windows itself > to be able to load msvcr90.dll at all (the dreadful Side by Side > assembly stuff). This is a totally independent issue of the _ftime > thing, and AFAIK, there is no way around it - except installing > msvcrt90.dll in system32 yourself, which is obviously a very bad idea. > > Patching mingw is necessary for 64 bits support, since their headers are > missing some math functions - no patch is needed for 32 bits. > > Yes, I've had my run in with manifest files. I avoid them by linking test programs against msvcrt or msvcr71 instead. A manifest is not needed for a DLL, luckily, as it uses the DLL libraries loaded by its host program. And I've had no luck using msvcr90.dll outside an SxS assembly. It needs a manifest file wherever it is, and I have had yet to writing a working manifest for a private assembly. So the Python developers' solution of copying msvcr90.dll into the Python directory is of no help. Lenard -- Lenard Lindstrom From david at ar.media.kyoto-u.ac.jp Sat Dec 27 23:45:34 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 28 Dec 2008 13:45:34 +0900 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <495701C3.8080804@telus.net> References: <49568AA0.8050108@telus.net> <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> <4956EA93.2000804@telus.net> <4956EB6F.1070908@ar.media.kyoto-u.ac.jp> <4956F795.30309@telus.net> <4956F6F7.5030605@ar.media.kyoto-u.ac.jp> <495701C3.8080804@telus.net> Message-ID: <4957046E.7000109@ar.media.kyoto-u.ac.jp> Lenard Lindstrom wrote: > Sorry for the confusion. I meant that I don't like patching SDL and > such. I build these in msys using the "configure; make install" > incantation, so can't easily use magic like manifest files. I don't know about SDL :) numpy needs manifests at the configuration stage, because it build code snippet which it needs to run, and this requires manifest AFAIK, if you link against msvcr90.dll. I did not want to link with another dll at the configuration stage, because this could lead to subtle issues. > Instead I > link the DLLs against msvcr71.dll, making sure there are also static > libraries, then create the msvcr90.dll linked DLL's from the static > libraries. This trick can also be used with the NumPy dependencies Blas > and fftw. Actually it is easier, since they are statically linked into > NumPy. Note that numpy does not depend on fftw. My solution to this was to generate the manifest file, which was relatively easy to do since we control our build process in python (look at numpy/distutils/mingw32compiler.py). To run a program 'locally' (one which is not installed), having the manifest in the same directory as the .exe is enough, so we only need to generate it. The main difficulty is to make sure you are using the same version of the dll as python: this feature has been integrated in python 2.6.1. For 2.6.0, I just assume it is the same as the official python binary. > Yes, I've had my run in with manifest files. I avoid them by linking > test programs against msvcrt or msvcr71 instead. A manifest is not > needed for a DLL, luckily, as it uses the DLL libraries loaded by its > host program. Yes, that's my understanding too. Since this is of course undocumented, we can only guess. > And I've had no luck using msvcr90.dll outside an SxS > assembly. It needs a manifest file wherever it is, and I have had yet to > writing a working manifest for a private assembly. So the Python > developers' solution of copying msvcr90.dll into the Python directory is > of no help. This may be of some interest for you: http://cournape.wordpress.com/2008/09/02/how-to-embed-a-manifest-into-a-dll-with-mingw-tools-only/ that's a summary of my own findings about manifest using only open source tools. All the necessary code, including the manifest template is in the mingw32compiler.py file: http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/distutils/mingw32ccompiler.py cheers, David From david at ar.media.kyoto-u.ac.jp Sun Dec 28 00:27:07 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 28 Dec 2008 14:27:07 +0900 Subject: [Numpy-discussion] formatting issues, locale and co Message-ID: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> Hi, While looking at the last failures of numpy trunk on windows for python 2.5 and 2.6, I got into floating point number formatting issues; I got deeper and deeper, and now I am lost. We have several problems: - we are not consistent between platforms, nor are we consistent with python - str(np.float32(a)) is locale dependent, but python str method is not (locale.str is) - formatting of long double does not work on windows because of the broken long double support in mingw. 1 consistency problem: ---------------------- python -c "a = 1e20; print a" -> 1e+020 python26 -c "a = 1e20; print a" -> 1e+20 In numpy, we use PyOS_snprintf for formatting, but python itself uses PyOS_ascii_formatd - which has different behavior on different versions of python. The above behavior can be simply reproduced in C: #include int main() { double x = 1e20; char c[200]; PyOS_ascii_format(c, sizeof(c), "%.12g", x); printf("%s\n", c); printf("%g\n", x); return 0; } On 2.5, this will print: 1e+020 1e+020 But on 2.6, this will print: 1e+20 1e+020 2 locale dependency: -------------------- Another issue is that our own formatting is local dependent, whereas python isn't: import numpy as np import locale locale.setlocale(locale.LC_NUMERIC, 'fr_FR') a = 1.2 print "str(a)", str(a) print "locale.str(a)", locale.str(a) print "str(np.float32(a))", str(np.float32(a)) print "locale.str(np.float32(a))", locale.str(np.float32(a)) Returns: str(a) 1.2 locale.str(a) 1,2 str(np.float32(a)) 1,2 locale.str(np.float32(a)) 1,20000004768 I thought about copying the way python does the formatting in the trunk (where discrepancies between platforms have been fixed), but this is not so easy, because it uses a lot of code from different places - and the code needs to be adapted to float and long double. The other solution would be to do our own formatting, but this does not sound easy: formatting in C is hard. I am not sure about what we should do, if anyone else has any idea ? cheers, David From len-l at telus.net Sun Dec 28 01:33:02 2008 From: len-l at telus.net (Lenard Lindstrom) Date: Sat, 27 Dec 2008 22:33:02 -0800 Subject: [Numpy-discussion] NumPy and Python 2.6 on Windows In-Reply-To: <4957046E.7000109@ar.media.kyoto-u.ac.jp> References: <49568AA0.8050108@telus.net> <5b8d13220812271810h61364649wd70abb38794d0f5c@mail.gmail.com> <4956EA93.2000804@telus.net> <4956EB6F.1070908@ar.media.kyoto-u.ac.jp> <4956F795.30309@telus.net> <4956F6F7.5030605@ar.media.kyoto-u.ac.jp> <495701C3.8080804@telus.net> <4957046E.7000109@ar.media.kyoto-u.ac.jp> Message-ID: <49571D9E.3000807@telus.net> David Cournapeau wrote: > Lenard Lindstrom wrote: > >> Sorry for the confusion. I meant that I don't like patching SDL and >> such. I build these in msys using the "configure; make install" >> incantation, so can't easily use magic like manifest files. >> > > I don't know about SDL :) numpy needs manifests at the configuration > stage, because it build code snippet which it needs to run, and this > requires manifest AFAIK, if you link against msvcr90.dll. I did not want > to link with another dll at the configuration stage, because this could > lead to subtle issues. > > Yes, it is best to have configuration programs link to msvcr90.dll when possible. Pygame configuration is relatively straight forward. No test programs are used. But all the dependencies, such as SDL, are built with Msys, and the Unix configuration shell scripts are used when possible. These scripts do create small test programs. It would be possible provide manifest files in this case, but only after determining the names and locations of all the test programs generated. For now it is simpler to just use a less fussy C runtime during configuration, then link in msvcr90 later. >> Instead I >> link the DLLs against msvcr71.dll, making sure there are also static >> libraries, then create the msvcr90.dll linked DLL's from the static >> libraries. This trick can also be used with the NumPy dependencies Blas >> and fftw. Actually it is easier, since they are statically linked into >> NumPy. >> > > Note that numpy does not depend on fftw. > > My solution to this was to generate the manifest file, which was > relatively easy to do since we control our build process in python (look > at numpy/distutils/mingw32compiler.py). To run a program 'locally' (one > which is not installed), having the manifest in the same directory as > the .exe is enough, so we only need to generate it. The main difficulty > is to make sure you are using the same version of the dll as python: > this feature has been integrated in python 2.6.1. For 2.6.0, I just > assume it is the same as the official python binary. > > A manifest file will work as long as the public key doesn't change. Or is the key provided by the developer rather than Microsoft's build tools? >> Yes, I've had my run in with manifest files. I avoid them by linking >> test programs against msvcrt or msvcr71 instead. A manifest is not >> needed for a DLL, luckily, as it uses the DLL libraries loaded by its >> host program. >> > > Yes, that's my understanding too. Since this is of course undocumented, > we can only guess. > > Well, this is what I've found for Python and Pygame extension modules anyway. And it makes a certain amount of sense if manifests exist to prevent library conflicts. A dynamic library and main program using different C runtime instances would be a problem. >> And I've had no luck using msvcr90.dll outside an SxS >> assembly. It needs a manifest file wherever it is, and I have had yet to >> writing a working manifest for a private assembly. So the Python >> developers' solution of copying msvcr90.dll into the Python directory is >> of no help. >> > > This may be of some interest for you: > > http://cournape.wordpress.com/2008/09/02/how-to-embed-a-manifest-into-a-dll-with-mingw-tools-only/ > > that's a summary of my own findings about manifest using only open > source tools. All the necessary code, including the manifest template is > in the mingw32compiler.py file: > > http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/distutils/mingw32ccompiler.py > > Ok, you have given me some more things to consider. Thanks for the discussion. I will keep you suggestions in mind. Just to clear up any confusion pygame.org is not intending to build and distribute NumPy with Pygame. NumPy was required to run the full Pygame test suite under Python 2.6. My custom NumPy build will disappear once scipy.org releases its Python 2.6 build. Lenard -- Lenard Lindstrom From charlesr.harris at gmail.com Sun Dec 28 01:38:29 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 27 Dec 2008 23:38:29 -0700 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> Message-ID: On Sat, Dec 27, 2008 at 10:27 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Hi, > > While looking at the last failures of numpy trunk on windows for > python 2.5 and 2.6, I got into floating point number formatting issues; > I got deeper and deeper, and now I am lost. We have several problems: > - we are not consistent between platforms, nor are we consistent > with python > - str(np.float32(a)) is locale dependent, but python str method is > not (locale.str is) > - formatting of long double does not work on windows because of the > broken long double support in mingw. > > 1 consistency problem: > ---------------------- > > python -c "a = 1e20; print a" -> 1e+020 > python26 -c "a = 1e20; print a" -> 1e+20 > > In numpy, we use PyOS_snprintf for formatting, but python itself uses > PyOS_ascii_formatd - which has different behavior on different versions > of python. The above behavior can be simply reproduced in C: > > #include > > int main() > { > double x = 1e20; > char c[200]; > > PyOS_ascii_format(c, sizeof(c), "%.12g", x); > printf("%s\n", c); > printf("%g\n", x); > > return 0; > } > > On 2.5, this will print: > > 1e+020 > 1e+020 > > But on 2.6, this will print: > > 1e+20 > 1e+020 > > 2 locale dependency: > -------------------- > > Another issue is that our own formatting is local dependent, whereas > python isn't: > > import numpy as np > import locale > locale.setlocale(locale.LC_NUMERIC, 'fr_FR') > a = 1.2 > > print "str(a)", str(a) > print "locale.str(a)", locale.str(a) > print "str(np.float32(a))", str(np.float32(a)) > print "locale.str(np.float32(a))", locale.str(np.float32(a)) > > Returns: > > str(a) 1.2 > locale.str(a) 1,2 > str(np.float32(a)) 1,2 > locale.str(np.float32(a)) 1,20000004768 > > I thought about copying the way python does the formatting in the trunk > (where discrepancies between platforms have been fixed), but this is not > so easy, because it uses a lot of code from different places - and the > code needs to be adapted to float and long double. The other solution > would be to do our own formatting, but this does not sound easy: > formatting in C is hard. I am not sure about what we should do, if > anyone else has any idea ? > I think the first thing to do is make a decision on locale. If we chose to support locales I don't see much choice but to depend Python because it's too much work otherwise, and work not directly related to Numpy at that. If we decide not to support locales then we can do our own formatting if we need to using a fixed choice of locale. There is a list of snprintf implementations here . Triolooks like a mature project and has an MIT license, which I think is a license compatible with Numpy. I'm inclined to just fix the locale and ignore the rest until Python gets things sorted out. But I'm lazy... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Dec 28 01:46:06 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 28 Dec 2008 01:46:06 -0500 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> Message-ID: <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> On Sun, Dec 28, 2008 at 01:38, Charles R Harris wrote: > > On Sat, Dec 27, 2008 at 10:27 PM, David Cournapeau > wrote: >> >> Hi, >> >> While looking at the last failures of numpy trunk on windows for >> python 2.5 and 2.6, I got into floating point number formatting issues; >> I got deeper and deeper, and now I am lost. We have several problems: >> - we are not consistent between platforms, nor are we consistent >> with python >> - str(np.float32(a)) is locale dependent, but python str method is >> not (locale.str is) >> - formatting of long double does not work on windows because of the >> broken long double support in mingw. >> >> 1 consistency problem: >> ---------------------- >> >> python -c "a = 1e20; print a" -> 1e+020 >> python26 -c "a = 1e20; print a" -> 1e+20 >> >> In numpy, we use PyOS_snprintf for formatting, but python itself uses >> PyOS_ascii_formatd - which has different behavior on different versions >> of python. The above behavior can be simply reproduced in C: >> >> #include >> >> int main() >> { >> double x = 1e20; >> char c[200]; >> >> PyOS_ascii_format(c, sizeof(c), "%.12g", x); >> printf("%s\n", c); >> printf("%g\n", x); >> >> return 0; >> } >> >> On 2.5, this will print: >> >> 1e+020 >> 1e+020 >> >> But on 2.6, this will print: >> >> 1e+20 >> 1e+020 >> >> 2 locale dependency: >> -------------------- >> >> Another issue is that our own formatting is local dependent, whereas >> python isn't: >> >> import numpy as np >> import locale >> locale.setlocale(locale.LC_NUMERIC, 'fr_FR') >> a = 1.2 >> >> print "str(a)", str(a) >> print "locale.str(a)", locale.str(a) >> print "str(np.float32(a))", str(np.float32(a)) >> print "locale.str(np.float32(a))", locale.str(np.float32(a)) >> >> Returns: >> >> str(a) 1.2 >> locale.str(a) 1,2 >> str(np.float32(a)) 1,2 >> locale.str(np.float32(a)) 1,20000004768 >> >> I thought about copying the way python does the formatting in the trunk >> (where discrepancies between platforms have been fixed), but this is not >> so easy, because it uses a lot of code from different places - and the >> code needs to be adapted to float and long double. The other solution >> would be to do our own formatting, but this does not sound easy: >> formatting in C is hard. I am not sure about what we should do, if >> anyone else has any idea ? > > I think the first thing to do is make a decision on locale. If we chose to > support locales I don't see much choice but to depend Python because it's > too much work otherwise, and work not directly related to Numpy at that. If > we decide not to support locales then we can do our own formatting if we > need to using a fixed choice of locale. There is a list of snprintf > implementations here. Trio looks like a mature project and has an MIT > license, which I think is a license compatible with Numpy. We should not support locales. The string representations of these elements should be Python-parseable. > I'm inclined to just fix the locale and ignore the rest until Python gets > things sorted out. But I'm lazy... What do you think Python doesn't have sorted out? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david at ar.media.kyoto-u.ac.jp Sun Dec 28 01:40:47 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 28 Dec 2008 15:40:47 +0900 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> Message-ID: <49571F6F.4070102@ar.media.kyoto-u.ac.jp> Robert Kern wrote: > > We should not support locales. The string representations of these > elements should be Python-parseable. > It looks like I was wrong in my analysis of the problem: I thought I was using the most recent implementation of PyOS_* functions in my test codes, but the ones in 2.6 are not the same as the ones in the current trunk. So the problem may be easier to fix that what I first thought: simply providing our own PyOS_ascii_formatd (and similar for float and long double) may be enough, and since we don't care about locale (%Z and %n), the function is simple (and can be pulled out from python sources). We would then use PyOS_ascii_format* (locale independant) instead of PyOS_snprintf (locale dependant) in str/repr implementation of scalar arrays. Does that sound acceptable to you ? cheers, David From charlesr.harris at gmail.com Sun Dec 28 01:58:58 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 27 Dec 2008 23:58:58 -0700 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> Message-ID: On Sat, Dec 27, 2008 at 11:46 PM, Robert Kern wrote: > On Sun, Dec 28, 2008 at 01:38, Charles R Harris > wrote: > > > > On Sat, Dec 27, 2008 at 10:27 PM, David Cournapeau > > wrote: > >> > >> Hi, > >> > >> While looking at the last failures of numpy trunk on windows for > >> python 2.5 and 2.6, I got into floating point number formatting issues; > >> I got deeper and deeper, and now I am lost. We have several problems: > >> - we are not consistent between platforms, nor are we consistent > >> with python > >> - str(np.float32(a)) is locale dependent, but python str method is > >> not (locale.str is) > >> - formatting of long double does not work on windows because of the > >> broken long double support in mingw. > >> > >> 1 consistency problem: > >> ---------------------- > >> > >> python -c "a = 1e20; print a" -> 1e+020 > >> python26 -c "a = 1e20; print a" -> 1e+20 > >> > >> In numpy, we use PyOS_snprintf for formatting, but python itself uses > >> PyOS_ascii_formatd - which has different behavior on different versions > >> of python. The above behavior can be simply reproduced in C: > >> > >> #include > >> > >> int main() > >> { > >> double x = 1e20; > >> char c[200]; > >> > >> PyOS_ascii_format(c, sizeof(c), "%.12g", x); > >> printf("%s\n", c); > >> printf("%g\n", x); > >> > >> return 0; > >> } > >> > >> On 2.5, this will print: > >> > >> 1e+020 > >> 1e+020 > >> > >> But on 2.6, this will print: > >> > >> 1e+20 > >> 1e+020 > >> > >> 2 locale dependency: > >> -------------------- > >> > >> Another issue is that our own formatting is local dependent, whereas > >> python isn't: > >> > >> import numpy as np > >> import locale > >> locale.setlocale(locale.LC_NUMERIC, 'fr_FR') > >> a = 1.2 > >> > >> print "str(a)", str(a) > >> print "locale.str(a)", locale.str(a) > >> print "str(np.float32(a))", str(np.float32(a)) > >> print "locale.str(np.float32(a))", locale.str(np.float32(a)) > >> > >> Returns: > >> > >> str(a) 1.2 > >> locale.str(a) 1,2 > >> str(np.float32(a)) 1,2 > >> locale.str(np.float32(a)) 1,20000004768 > >> > >> I thought about copying the way python does the formatting in the trunk > >> (where discrepancies between platforms have been fixed), but this is not > >> so easy, because it uses a lot of code from different places - and the > >> code needs to be adapted to float and long double. The other solution > >> would be to do our own formatting, but this does not sound easy: > >> formatting in C is hard. I am not sure about what we should do, if > >> anyone else has any idea ? > > > > I think the first thing to do is make a decision on locale. If we chose > to > > support locales I don't see much choice but to depend Python because it's > > too much work otherwise, and work not directly related to Numpy at that. > If > > we decide not to support locales then we can do our own formatting if we > > need to using a fixed choice of locale. There is a list of snprintf > > implementations here. Trio looks like a mature project and has an MIT > > license, which I think is a license compatible with Numpy. > > We should not support locales. The string representations of these > elements should be Python-parseable. > > > I'm inclined to just fix the locale and ignore the rest until Python gets > > things sorted out. But I'm lazy... > > What do you think Python doesn't have sorted out? > Consistency between versions and platforms. David's note with the ticket points to a Python 3.0 bug on this reported about, oh, two years ago. If we wait long enough this problem will eventually get fixed as old python versions disappear and some sort decision is made for the 3.x series. Or we could do our own and be consistent with ourselves. There is also the problem of long doubles on the windows platform, which isn't Python specific since Python doesn't use long doubles. As I understand long doubles on windows, mingw32 supports them, VS doesn't, so there is a compiler inconsistency to deal with also. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Sun Dec 28 01:55:56 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 28 Dec 2008 15:55:56 +0900 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> Message-ID: <495722FC.2050109@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On Sat, Dec 27, 2008 at 11:46 PM, Robert Kern > wrote: > > On Sun, Dec 28, 2008 at 01:38, Charles R Harris > > wrote: > > > > On Sat, Dec 27, 2008 at 10:27 PM, David Cournapeau > > > wrote: > >> > >> Hi, > >> > >> While looking at the last failures of numpy trunk on windows for > >> python 2.5 and 2.6, I got into floating point number formatting > issues; > >> I got deeper and deeper, and now I am lost. We have several > problems: > >> - we are not consistent between platforms, nor are we consistent > >> with python > >> - str(np.float32(a)) is locale dependent, but python str > method is > >> not (locale.str is) > >> - formatting of long double does not work on windows because > of the > >> broken long double support in mingw. > >> > >> 1 consistency problem: > >> ---------------------- > >> > >> python -c "a = 1e20; print a" -> 1e+020 > >> python26 -c "a = 1e20; print a" -> 1e+20 > >> > >> In numpy, we use PyOS_snprintf for formatting, but python > itself uses > >> PyOS_ascii_formatd - which has different behavior on different > versions > >> of python. The above behavior can be simply reproduced in C: > >> > >> #include > >> > >> int main() > >> { > >> double x = 1e20; > >> char c[200]; > >> > >> PyOS_ascii_format(c, sizeof(c), "%.12g", x); > >> printf("%s\n", c); > >> printf("%g\n", x); > >> > >> return 0; > >> } > >> > >> On 2.5, this will print: > >> > >> 1e+020 > >> 1e+020 > >> > >> But on 2.6, this will print: > >> > >> 1e+20 > >> 1e+020 > >> > >> 2 locale dependency: > >> -------------------- > >> > >> Another issue is that our own formatting is local dependent, > whereas > >> python isn't: > >> > >> import numpy as np > >> import locale > >> locale.setlocale(locale.LC_NUMERIC, 'fr_FR') > >> a = 1.2 > >> > >> print "str(a)", str(a) > >> print "locale.str(a)", locale.str(a) > >> print "str(np.float32(a))", str(np.float32(a)) > >> print "locale.str(np.float32(a))", locale.str(np.float32(a)) > >> > >> Returns: > >> > >> str(a) 1.2 > >> locale.str(a) 1,2 > >> str(np.float32(a)) 1,2 > >> locale.str(np.float32(a)) 1,20000004768 > >> > >> I thought about copying the way python does the formatting in > the trunk > >> (where discrepancies between platforms have been fixed), but > this is not > >> so easy, because it uses a lot of code from different places - > and the > >> code needs to be adapted to float and long double. The other > solution > >> would be to do our own formatting, but this does not sound easy: > >> formatting in C is hard. I am not sure about what we should do, if > >> anyone else has any idea ? > > > > I think the first thing to do is make a decision on locale. If > we chose to > > support locales I don't see much choice but to depend Python > because it's > > too much work otherwise, and work not directly related to Numpy > at that. If > > we decide not to support locales then we can do our own > formatting if we > > need to using a fixed choice of locale. There is a list of snprintf > > implementations here. Trio looks like a mature project and has > an MIT > > license, which I think is a license compatible with Numpy. > > We should not support locales. The string representations of these > elements should be Python-parseable. > > > I'm inclined to just fix the locale and ignore the rest until > Python gets > > things sorted out. But I'm lazy... > > What do you think Python doesn't have sorted out? > > > Consistency between versions and platforms. David's note with the > ticket points to a Python 3.0 bug on this reported about, oh, two > years ago. As an example: in python 2.6, they solved some issues like inf/nan by interpreting the strings in python before outputting them, but we do not use their fix. So we have: python -c "import numpy as np; print np.log(0)" -> -inf (python 2.6) / -1.#INF (2.5, which is the format from the MS runtime). But: python -c "import numpy as np; print np.log(0).astype(np.float32)" -> -1.#INF (both 2.6 and 2.5) Etc... We can't be consistent with ourselves and with python at the same time, I think. I don't know which one is best: numpy being consistent through platforms and python versions, or being consistent with python. > There is also the problem of long doubles on the windows platform, > which isn't Python specific since Python doesn't use long doubles. As > I understand long doubles on windows, mingw32 supports them, VS > doesn't, so there is a compiler inconsistency to deal with also. To be exact, both mingw and VS support long double sensu stricto: the long double type is available. But sizeof(long double) == sizeof(double) with VS toolchain, and sizeof(long double) is 12 with mingw. The later is a pain, because mingw use both MS runtime (printf) and its own function (some math funcs), so we can't easily be consistent (either 8 or 12 bytes long double) with mingw. One solution would be to use the mingwex printf (a printf reimplementation available on recent mingwrt) instead of MSVC runtime - I would hope that this one is fixed wrt long double. This problem is even worse on 64 bits (long double are 16 bytes by default there with mingw). cheers, David From charlesr.harris at gmail.com Sun Dec 28 02:12:19 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 28 Dec 2008 00:12:19 -0700 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <49571F6F.4070102@ar.media.kyoto-u.ac.jp> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> Message-ID: On Sat, Dec 27, 2008 at 11:40 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Robert Kern wrote: > > > > We should not support locales. The string representations of these > > elements should be Python-parseable. > > > > It looks like I was wrong in my analysis of the problem: I thought I was > using the most recent implementation of PyOS_* functions in my test > codes, but the ones in 2.6 are not the same as the ones in the current > trunk. So the problem may be easier to fix that what I first thought: > simply providing our own PyOS_ascii_formatd (and similar for float and > long double) may be enough, and since we don't care about locale (%Z and > %n), the function is simple (and can be pulled out from python sources). > > We would then use PyOS_ascii_format* (locale independant) instead of > PyOS_snprintf (locale dependant) in str/repr implementation of scalar > arrays. Does that sound acceptable to you ? > As long as we rename it ;) Trio might be worth a look anyway as it has some extensions that might be useful, binary formats, for instance. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Dec 28 02:31:22 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 28 Dec 2008 00:31:22 -0700 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <495722FC.2050109@ar.media.kyoto-u.ac.jp> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <495722FC.2050109@ar.media.kyoto-u.ac.jp> Message-ID: On Sat, Dec 27, 2008 at 11:55 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Charles R Harris wrote: > > > > > > On Sat, Dec 27, 2008 at 11:46 PM, Robert Kern > > wrote: > > > > On Sun, Dec 28, 2008 at 01:38, Charles R Harris > > > > wrote: > > > > > > On Sat, Dec 27, 2008 at 10:27 PM, David Cournapeau > > > > > wrote: > > >> > > >> Hi, > > >> > > >> While looking at the last failures of numpy trunk on windows > for > > >> python 2.5 and 2.6, I got into floating point number formatting > > issues; > > >> I got deeper and deeper, and now I am lost. We have several > > problems: > > >> - we are not consistent between platforms, nor are we > consistent > > >> with python > > >> - str(np.float32(a)) is locale dependent, but python str > > method is > > >> not (locale.str is) > > >> - formatting of long double does not work on windows because > > of the > > >> broken long double support in mingw. > > >> > > >> 1 consistency problem: > > >> ---------------------- > > >> > > >> python -c "a = 1e20; print a" -> 1e+020 > > >> python26 -c "a = 1e20; print a" -> 1e+20 > > >> > > >> In numpy, we use PyOS_snprintf for formatting, but python > > itself uses > > >> PyOS_ascii_formatd - which has different behavior on different > > versions > > >> of python. The above behavior can be simply reproduced in C: > > >> > > >> #include > > >> > > >> int main() > > >> { > > >> double x = 1e20; > > >> char c[200]; > > >> > > >> PyOS_ascii_format(c, sizeof(c), "%.12g", x); > > >> printf("%s\n", c); > > >> printf("%g\n", x); > > >> > > >> return 0; > > >> } > > >> > > >> On 2.5, this will print: > > >> > > >> 1e+020 > > >> 1e+020 > > >> > > >> But on 2.6, this will print: > > >> > > >> 1e+20 > > >> 1e+020 > > >> > > >> 2 locale dependency: > > >> -------------------- > > >> > > >> Another issue is that our own formatting is local dependent, > > whereas > > >> python isn't: > > >> > > >> import numpy as np > > >> import locale > > >> locale.setlocale(locale.LC_NUMERIC, 'fr_FR') > > >> a = 1.2 > > >> > > >> print "str(a)", str(a) > > >> print "locale.str(a)", locale.str(a) > > >> print "str(np.float32(a))", str(np.float32(a)) > > >> print "locale.str(np.float32(a))", locale.str(np.float32(a)) > > >> > > >> Returns: > > >> > > >> str(a) 1.2 > > >> locale.str(a) 1,2 > > >> str(np.float32(a)) 1,2 > > >> locale.str(np.float32(a)) 1,20000004768 > > >> > > >> I thought about copying the way python does the formatting in > > the trunk > > >> (where discrepancies between platforms have been fixed), but > > this is not > > >> so easy, because it uses a lot of code from different places - > > and the > > >> code needs to be adapted to float and long double. The other > > solution > > >> would be to do our own formatting, but this does not sound easy: > > >> formatting in C is hard. I am not sure about what we should do, if > > >> anyone else has any idea ? > > > > > > I think the first thing to do is make a decision on locale. If > > we chose to > > > support locales I don't see much choice but to depend Python > > because it's > > > too much work otherwise, and work not directly related to Numpy > > at that. If > > > we decide not to support locales then we can do our own > > formatting if we > > > need to using a fixed choice of locale. There is a list of snprintf > > > implementations here. Trio looks like a mature project and has > > an MIT > > > license, which I think is a license compatible with Numpy. > > > > We should not support locales. The string representations of these > > elements should be Python-parseable. > > > > > I'm inclined to just fix the locale and ignore the rest until > > Python gets > > > things sorted out. But I'm lazy... > > > > What do you think Python doesn't have sorted out? > > > > > > Consistency between versions and platforms. David's note with the > > ticket points to a Python 3.0 bug on this reported about, oh, two > > years ago. > > As an example: in python 2.6, they solved some issues like inf/nan by > interpreting the strings in python before outputting them, but we do not > use their fix. So we have: > > python -c "import numpy as np; print np.log(0)" -> -inf (python 2.6) / > -1.#INF (2.5, which is the format from the MS runtime). > > But: > > python -c "import numpy as np; print np.log(0).astype(np.float32)" -> > -1.#INF (both 2.6 and 2.5) > > Etc... We can't be consistent with ourselves and with python at the same > time, I think. I don't know which one is best: numpy being consistent > through platforms and python versions, or being consistent with python. > > > There is also the problem of long doubles on the windows platform, > > which isn't Python specific since Python doesn't use long doubles. As > > I understand long doubles on windows, mingw32 supports them, VS > > doesn't, so there is a compiler inconsistency to deal with also. > > To be exact, both mingw and VS support long double sensu stricto: the > long double type is available. But sizeof(long double) == sizeof(double) > with VS toolchain, and sizeof(long double) is 12 with mingw. The later > is a pain, because mingw use both MS runtime (printf) and its own > function (some math funcs), so we can't easily be consistent (either 8 > or 12 bytes long double) with mingw. One solution would be to use the > mingwex printf (a printf reimplementation available on recent mingwrt) > instead of MSVC runtime - I would hope that this one is fixed wrt long > double. This problem is even worse on 64 bits (long double are 16 bytes > by default there with mingw). > I think there are also less visible problems with string to number conversions, so that might be a reason to consider third party software. Python doesn't directly support conversion of complex numbers presented as strings, for instance, although that may have been fixed in 3.0. So extending some third party sscanf might be useful. The question comes of how much time you want to spend on this. I know working on a dissertation is a great excuse to do something else; I spent some weeks writing my own latex dissertation class, for instance. But I don't know if that is recommended practice. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at informa.tiker.net Sun Dec 28 19:23:28 2008 From: lists at informa.tiker.net (Andreas =?utf-8?q?Kl=C3=B6ckner?=) Date: Sun, 28 Dec 2008 19:23:28 -0500 Subject: [Numpy-discussion] Should object arrays have a buffer interface? Message-ID: <200812290123.32287.lists@informa.tiker.net> Hi all, I don't think PyObject pointers should be accessible via the buffer interface. I'd throw an error, but maybe a (silenceable) warning would do. Would have saved me some bug-hunting. >>> import numpy >>> numpy.array([55, (33,)], dtype=object) >>> x = numpy.array([55, (33,)], dtype=object) >>> x array([55, (33,)], dtype=object) >>> buffer(x) >>> str(buffer(x)) '\xb0\x1c\x17\x08l\x89\xd7\xb7' >>> numpy.__version__ '1.1.0' Opinions? Andreas -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: From robert.kern at gmail.com Sun Dec 28 20:01:38 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 28 Dec 2008 20:01:38 -0500 Subject: [Numpy-discussion] Should object arrays have a buffer interface? In-Reply-To: <200812290123.32287.lists@informa.tiker.net> References: <200812290123.32287.lists@informa.tiker.net> Message-ID: <3d375d730812281701p11be80cya6761ab858d25134@mail.gmail.com> On Sun, Dec 28, 2008 at 19:23, Andreas Kl?ckner wrote: > Hi all, > > I don't think PyObject pointers should be accessible via the buffer interface. > I'd throw an error, but maybe a (silenceable) warning would do. Would have > saved me some bug-hunting. Can you describe in more detail what problem it caused? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lists at informa.tiker.net Sun Dec 28 20:38:38 2008 From: lists at informa.tiker.net (Andreas =?utf-8?q?Kl=C3=B6ckner?=) Date: Sun, 28 Dec 2008 20:38:38 -0500 Subject: [Numpy-discussion] Should object arrays have a buffer interface? In-Reply-To: <3d375d730812281701p11be80cya6761ab858d25134@mail.gmail.com> References: <200812290123.32287.lists@informa.tiker.net> <3d375d730812281701p11be80cya6761ab858d25134@mail.gmail.com> Message-ID: <200812290238.39702.lists@informa.tiker.net> On Montag 29 Dezember 2008, Robert Kern wrote: > On Sun, Dec 28, 2008 at 19:23, Andreas Kl?ckner wrote: > > Hi all, > > > > I don't think PyObject pointers should be accessible via the buffer > > interface. I'd throw an error, but maybe a (silenceable) warning would > > do. Would have saved me some bug-hunting. > > Can you describe in more detail what problem it caused? Well, I'm a little bit embarrassed. :) But here goes. I have one-line MPI wrappers that build on Boost.MPI and Boost.Python. They take a numpy array, obtain its buffer, and shove that into Boost.MPI's isend(). My code does some sort of term evaluation, and instead of shoving the evaluated floating point vector into MPI, it instead used the (un-evaluated) symbolic vector, which is represented as an object array. My MPI wrapper happily handed that object array's buffer to MPI. Oddly, instead of the deserved segfault, I just got garbage data on the other end. (Well, some other machine's PyObject pointers, really.) I guess I'm wishing I would've been prevented from falling into that trap, and I ended up wondering if there actually is a legitimate use of the buffer interface for object arrays. Andreas -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: From robert.kern at gmail.com Sun Dec 28 21:15:08 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 28 Dec 2008 21:15:08 -0500 Subject: [Numpy-discussion] Should object arrays have a buffer interface? In-Reply-To: <200812290238.39702.lists@informa.tiker.net> References: <200812290123.32287.lists@informa.tiker.net> <3d375d730812281701p11be80cya6761ab858d25134@mail.gmail.com> <200812290238.39702.lists@informa.tiker.net> Message-ID: <3d375d730812281815y6255e78bxd359405f0f0c180d@mail.gmail.com> On Sun, Dec 28, 2008 at 20:38, Andreas Kl?ckner wrote: > On Montag 29 Dezember 2008, Robert Kern wrote: >> On Sun, Dec 28, 2008 at 19:23, Andreas Kl?ckner > wrote: >> > Hi all, >> > >> > I don't think PyObject pointers should be accessible via the buffer >> > interface. I'd throw an error, but maybe a (silenceable) warning would >> > do. Would have saved me some bug-hunting. >> >> Can you describe in more detail what problem it caused? > > Well, I'm a little bit embarrassed. :) But here goes. > > I have one-line MPI wrappers that build on Boost.MPI and Boost.Python. They > take a numpy array, obtain its buffer, and shove that into Boost.MPI's > isend(). How do you communicate the dtype? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lists at informa.tiker.net Sun Dec 28 21:52:00 2008 From: lists at informa.tiker.net (Andreas =?utf-8?q?Kl=C3=B6ckner?=) Date: Sun, 28 Dec 2008 21:52:00 -0500 Subject: [Numpy-discussion] Should object arrays have a buffer interface? In-Reply-To: <3d375d730812281815y6255e78bxd359405f0f0c180d@mail.gmail.com> References: <200812290123.32287.lists@informa.tiker.net> <200812290238.39702.lists@informa.tiker.net> <3d375d730812281815y6255e78bxd359405f0f0c180d@mail.gmail.com> Message-ID: <200812290352.02721.lists@informa.tiker.net> On Montag 29 Dezember 2008, Robert Kern wrote: > On Sun, Dec 28, 2008 at 20:38, Andreas Kl?ckner wrote: > > On Montag 29 Dezember 2008, Robert Kern wrote: > >> On Sun, Dec 28, 2008 at 19:23, Andreas Kl?ckner > >> > > > > wrote: > >> > Hi all, > >> > > >> > I don't think PyObject pointers should be accessible via the buffer > >> > interface. I'd throw an error, but maybe a (silenceable) warning would > >> > do. Would have saved me some bug-hunting. > >> > >> Can you describe in more detail what problem it caused? > > > > Well, I'm a little bit embarrassed. :) But here goes. > > > > I have one-line MPI wrappers that build on Boost.MPI and Boost.Python. > > They take a numpy array, obtain its buffer, and shove that into > > Boost.MPI's isend(). > > How do you communicate the dtype? I don't. The app is a PDE solver, both ends are working at the same (known) precision. Passing an object array was completely wrong, but since my wrapper functions only deal with the buffer API, they couldn't really do the checking. Andreas -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: From robert.kern at gmail.com Sun Dec 28 22:28:53 2008 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 28 Dec 2008 22:28:53 -0500 Subject: [Numpy-discussion] Should object arrays have a buffer interface? In-Reply-To: <200812290352.02721.lists@informa.tiker.net> References: <200812290123.32287.lists@informa.tiker.net> <200812290238.39702.lists@informa.tiker.net> <3d375d730812281815y6255e78bxd359405f0f0c180d@mail.gmail.com> <200812290352.02721.lists@informa.tiker.net> Message-ID: <3d375d730812281928y614170ccm9c72ee646235ee7c@mail.gmail.com> On Sun, Dec 28, 2008 at 21:52, Andreas Kl?ckner wrote: > On Montag 29 Dezember 2008, Robert Kern wrote: >> On Sun, Dec 28, 2008 at 20:38, Andreas Kl?ckner > wrote: >> > On Montag 29 Dezember 2008, Robert Kern wrote: >> >> On Sun, Dec 28, 2008 at 19:23, Andreas Kl?ckner >> >> >> > >> > wrote: >> >> > Hi all, >> >> > >> >> > I don't think PyObject pointers should be accessible via the buffer >> >> > interface. I'd throw an error, but maybe a (silenceable) warning would >> >> > do. Would have saved me some bug-hunting. >> >> >> >> Can you describe in more detail what problem it caused? >> > >> > Well, I'm a little bit embarrassed. :) But here goes. >> > >> > I have one-line MPI wrappers that build on Boost.MPI and Boost.Python. >> > They take a numpy array, obtain its buffer, and shove that into >> > Boost.MPI's isend(). >> >> How do you communicate the dtype? > > I don't. The app is a PDE solver, both ends are working at the same (known) > precision. Passing an object array was completely wrong, but since my wrapper > functions only deal with the buffer API, they couldn't really do the checking. You could wrap the wrappers in Python and check the dtype. You'd have a similar bug if you passed a wrong non-object dtype, too. Checking/communicating the dtype is something you always have to do when using the 2.x buffer protocol. I'm inclined not to make object a special case. When you ask for the raw bytes, you should get the raw bytes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From cournape at gmail.com Sun Dec 28 23:38:12 2008 From: cournape at gmail.com (David Cournapeau) Date: Mon, 29 Dec 2008 13:38:12 +0900 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> On Sun, Dec 28, 2008 at 4:12 PM, Charles R Harris wrote: > > > On Sat, Dec 27, 2008 at 11:40 PM, David Cournapeau > wrote: >> >> Robert Kern wrote: >> > >> > We should not support locales. The string representations of these >> > elements should be Python-parseable. >> > >> >> It looks like I was wrong in my analysis of the problem: I thought I was >> using the most recent implementation of PyOS_* functions in my test >> codes, but the ones in 2.6 are not the same as the ones in the current >> trunk. So the problem may be easier to fix that what I first thought: >> simply providing our own PyOS_ascii_formatd (and similar for float and >> long double) may be enough, and since we don't care about locale (%Z and >> %n), the function is simple (and can be pulled out from python sources). >> >> We would then use PyOS_ascii_format* (locale independant) instead of >> PyOS_snprintf (locale dependant) in str/repr implementation of scalar >> arrays. Does that sound acceptable to you ? > I put my yesterday work in the fix_float_format branch: - it fixes the locale issue - it fixes the long double issue on windows. - it also fixes some tests (we were not testing single precision formatting but twice double precision instead - the single precision test fails on the trunk BTW). - it handles inf and nan more consistently across platforms (e.g. str(np.log(0)) will be '-inf' on all platforms; on windows, it used to be '-1.#INF' - I was afraid it would broke converting back the string to float, but it is broken anyway before my change, e.g. float('-1.#INF') does not work on windows). - for now, it breaks in windows python 2.5, because float(1e10) used to be 1e+010 on python 2.5 and is 1e+10 on python 2.6 (to be more consistent with C99). But I could simply forces a backward compatibility with python 2.5/2.4, since I can control the number of digits in the exponent in the formatting code. There are still some problems related for double which I am not sure how to solve: import numpy as np a = 1e10 print np.float32(a) # -> call format_float print np.float64(a) # -> do not call format_double print np.float96(a) # -> call format_longdouble I guess the different with float64 comes from its multi-inheritence (that is, it derives from the builtin float, and the rules for print are different that for the other). Is this behavior the expected one ? cheers, David From charlesr.harris at gmail.com Mon Dec 29 00:36:40 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 28 Dec 2008 22:36:40 -0700 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> Message-ID: On Sun, Dec 28, 2008 at 9:38 PM, David Cournapeau wrote: > On Sun, Dec 28, 2008 at 4:12 PM, Charles R Harris > wrote: > > > > > > On Sat, Dec 27, 2008 at 11:40 PM, David Cournapeau > > wrote: > >> > >> Robert Kern wrote: > >> > > >> > We should not support locales. The string representations of these > >> > elements should be Python-parseable. > >> > > >> > >> It looks like I was wrong in my analysis of the problem: I thought I was > >> using the most recent implementation of PyOS_* functions in my test > >> codes, but the ones in 2.6 are not the same as the ones in the current > >> trunk. So the problem may be easier to fix that what I first thought: > >> simply providing our own PyOS_ascii_formatd (and similar for float and > >> long double) may be enough, and since we don't care about locale (%Z and > >> %n), the function is simple (and can be pulled out from python sources). > >> > >> We would then use PyOS_ascii_format* (locale independant) instead of > >> PyOS_snprintf (locale dependant) in str/repr implementation of scalar > >> arrays. Does that sound acceptable to you ? > > > > I put my yesterday work in the fix_float_format branch: > - it fixes the locale issue > - it fixes the long double issue on windows. > - it also fixes some tests (we were not testing single precision > formatting but twice double precision instead - the single precision > test fails on the trunk BTW). Curious, I don't see any test failures here. Were the tests actually being run or is something else different in your test setup? Or do you mean the fixed up test fails. > > - it handles inf and nan more consistently across platforms (e.g. > str(np.log(0)) will be '-inf' on all platforms; on windows, it used to > be '-1.#INF' - I was afraid it would broke converting back the string > to float, but it is broken anyway before my change, e.g. > float('-1.#INF') does not work on windows). > - for now, it breaks in windows python 2.5, because float(1e10) used > to be 1e+010 on python 2.5 and is 1e+10 on python 2.6 (to be more > consistent with C99). But I could simply forces a backward > compatibility with python 2.5/2.4, since I can control the number of > digits in the exponent in the formatting code. > > There are still some problems related for double which I am not sure > how to solve: > > import numpy as np > a = 1e10 > print np.float32(a) # -> call format_float > print np.float64(a) # -> do not call format_double > print np.float96(a) # -> call format_longdouble > > I guess the different with float64 comes from its multi-inheritence > (that is, it derives from the builtin float, and the rules for print > are different that for the other). Is this behavior the expected one ? > Expected, but I would like to see it change because it is kind of frustrating. Fixing it probably involves setting a function pointer in the type definition but I am not sure about that. We might also want to do something about integers, as in Python 3.0 they will all be Python long integers. I don't know if that actually breaks anything in numpy, or how Python 3.0 implements integers, but it might be a good idea not to derive from Python integers. How that will affect indexing speed I don't know. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Mon Dec 29 00:35:50 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 29 Dec 2008 14:35:50 +0900 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> Message-ID: <495861B6.8050707@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > > I put my yesterday work in the fix_float_format branch: > - it fixes the locale issue > - it fixes the long double issue on windows. > - it also fixes some tests (we were not testing single precision > formatting but twice double precision instead - the single precision > test fails on the trunk BTW). > > > Curious, I don't see any test failures here. Were the tests actually > being run or is something else different in your test setup? Or do you > mean the fixed up test fails. The later: if you look at numpy/core/tests/test_print, you will see that the types tested are np.float, np.double and np.longdouble, but at least on linux, np.float == np.double, and np.float32 is what we want to test I suppose here instead. > > Expected, but I would like to see it change because it is kind of > frustrating. Fixing it probably involves setting a function pointer in > the type definition but I am not sure about that. Hm, it took me a while to get this, but print np.float32(value) can be controlled through tp_print. Still, it does not work in all cases: print np.float32(a) -> call the tp_print print '%f' % np.float32(a) -> does not call the tp_print (nor tp_str/tp_repr). I have no idea what going on there. > We might also want to do something about integers, as in Python 3.0 > they will all be Python long integers. I will only care about floating point numbers for now, since they have problem today in numpy, with currently used python interpreters :) David From charlesr.harris at gmail.com Mon Dec 29 02:36:29 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 29 Dec 2008 00:36:29 -0700 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <495861B6.8050707@ar.media.kyoto-u.ac.jp> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> <495861B6.8050707@ar.media.kyoto-u.ac.jp> Message-ID: On Sun, Dec 28, 2008 at 10:35 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Charles R Harris wrote: > > > > > > > > I put my yesterday work in the fix_float_format branch: > > - it fixes the locale issue > > - it fixes the long double issue on windows. > > - it also fixes some tests (we were not testing single precision > > formatting but twice double precision instead - the single precision > > test fails on the trunk BTW). > > > > > > Curious, I don't see any test failures here. Were the tests actually > > being run or is something else different in your test setup? Or do you > > mean the fixed up test fails. > > The later: if you look at numpy/core/tests/test_print, you will see that > the types tested are np.float, np.double and np.longdouble, but at least > on linux, np.float == np.double, and np.float32 is what we want to test > I suppose here instead. > > > > > Expected, but I would like to see it change because it is kind of > > frustrating. Fixing it probably involves setting a function pointer in > > the type definition but I am not sure about that. > > Hm, it took me a while to get this, but print np.float32(value) can be > controlled through tp_print. Still, it does not work in all cases: > > print np.float32(a) -> call the tp_print > print '%f' % np.float32(a) -> does not call the tp_print (nor > tp_str/tp_repr). I have no idea what going on there. > I'll bet it's calling a conversion to python float, i.e., double, because of the %f. In [1]: '%s' % np.float32(1) Out[1]: '1.0' In [2]: '%f' % np.float32(1) Out[2]: '1.000000' I don't see any way to work around that without changing the way the python formatting works. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at informa.tiker.net Mon Dec 29 04:35:01 2008 From: lists at informa.tiker.net (Andreas =?utf-8?q?Kl=C3=B6ckner?=) Date: Mon, 29 Dec 2008 04:35:01 -0500 Subject: [Numpy-discussion] Should object arrays have a buffer interface? In-Reply-To: <3d375d730812281928y614170ccm9c72ee646235ee7c@mail.gmail.com> References: <200812290123.32287.lists@informa.tiker.net> <200812290352.02721.lists@informa.tiker.net> <3d375d730812281928y614170ccm9c72ee646235ee7c@mail.gmail.com> Message-ID: <200812291035.02910.lists@informa.tiker.net> On Montag 29 Dezember 2008, Robert Kern wrote: > You could wrap the wrappers in Python and check the dtype. You'd have > a similar bug if you passed a wrong non-object dtype, too. > Checking/communicating the dtype is something you always have to do > when using the 2.x buffer protocol. I'm inclined not to make object a > special case. When you ask for the raw bytes, you should get the raw > bytes. Ok, fair enough. Andreas -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: From boogaloojb at yahoo.fr Mon Dec 29 10:58:09 2008 From: boogaloojb at yahoo.fr (Jean-Baptiste Rudant) Date: Mon, 29 Dec 2008 15:58:09 +0000 (GMT) Subject: [Numpy-discussion] Alternative to record array Message-ID: <602202.94406.qm@web28502.mail.ukl.yahoo.com> Hello, I like to use record arrays to access fields by their name, and because they are esay to use with pytables. But I think it's not very effiicient for what I have to do. Maybe I'm misunderstanding something. Example : import numpy as np age = np.random.randint(0, 99, 10e6) weight = np.random.randint(0, 200, 10e6) data = np.rec.fromarrays((age, weight), names='age, weight') # the kind of operations I do is : data.age += data.age + 1 # but it's far less efficient than doing : age += 1 # because I think the record array stores [(age_0, weight_0) ...(age_n, weight_n)] # and not [age0 ... age_n] then [weight_0 ... weight_n]. So I think I don't use record arrays for the right purpose. I only need something which would make me esasy to manipulate data by accessing fields by their name. Am I wrong ? Is their something in numpy for my purpose ? Do I have to implement my own class, with something like : class FieldArray: def __init__(self, array_dict): self.array_list = array_dict def __getitem__(self, field): return self.array_list[field] def __setitem__(self, field, value): self.array_list[field] = value my_arrays = {'age': age, 'weight' : weight} data = FieldArray(my_arrays) data['age'] += 1 Thank you for the help, Jean-Baptiste Rudant -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jim.Vickroy at noaa.gov Mon Dec 29 12:37:21 2008 From: Jim.Vickroy at noaa.gov (Jim Vickroy) Date: Mon, 29 Dec 2008 10:37:21 -0700 Subject: [Numpy-discussion] Alternative to record array In-Reply-To: <602202.94406.qm@web28502.mail.ukl.yahoo.com> References: <602202.94406.qm@web28502.mail.ukl.yahoo.com> Message-ID: <49590AD1.2050701@noaa.gov> Jean-Baptiste Rudant wrote: > Hello, > > I like to use record arrays to access fields by their name, and > because they are esay to use with pytables. But I think it's not very > effiicient for what I have to do. Maybe I'm misunderstanding something. > > Example : > > import numpy as np > age = np.random.randint(0, 99, 10e6) > weight = np.random.randint(0, 200, 10e6) > data = np.rec.fromarrays((age, weight), names='age, weight') > # the kind of operations I do is : > data.age += data.age + 1 > # but it's far less efficient than doing : > age += 1 > # because I think the record array stores [(age_0, weight_0) > ...(age_n, weight_n)] > # and not [age0 ... age_n] then [weight_0 ... weight_n]. Sorry I am not able to answer your question; I am really a new user of numpy also. It does seem the addition operation is more than 4 times slower, when using record arrays, based on the following: >>> import numpy, sys, timeit >>> sys.version '2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)]' >>> numpy.__version__ '1.2.1' >>> count = 10e6 >>> ages = numpy.random.randint(0,100,count) >>> weights = numpy.random.randint(1,200,count) >>> data = numpy.rec.fromarrays((ages,weights),names='ages,weights') >>> >>> timer = timeit.Timer('data.ages += 1','from __main__ import data') >>> timer.timeit(number=100) 30.110649537860262 >>> >>> timer = timeit.Timer('ages += 1','from __main__ import ages') >>> timer.timeit(number=100) 6.9850710076280507 >>> > > So I think I don't use record arrays for the right purpose. I only > need something which would make me esasy to manipulate data by > accessing fields by their name. > > Am I wrong ? Is their something in numpy for my purpose ? Do I have to > implement my own class, with something like : > > > class FieldArray: > def __init__(self, array_dict): > self.array_list = array_dict > > def __getitem__(self, field): > return self.array_list[field] > > def __setitem__(self, field, value): > self.array_list[field] = value > > my_arrays = {'age': age, 'weight' : weight} > data = FieldArray(my_arrays) > > data['age'] += 1 > > Thank you for the help, > > Jean-Baptiste Rudant > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Mon Dec 29 12:38:56 2008 From: rmay31 at gmail.com (Ryan May) Date: Mon, 29 Dec 2008 11:38:56 -0600 Subject: [Numpy-discussion] Alternative to record array In-Reply-To: <602202.94406.qm@web28502.mail.ukl.yahoo.com> References: <602202.94406.qm@web28502.mail.ukl.yahoo.com> Message-ID: <49590B30.7030906@gmail.com> Jean-Baptiste Rudant wrote: > Hello, > > I like to use record arrays to access fields by their name, and because > they are esay to use with pytables. But I think it's not very effiicient > for what I have to do. Maybe I'm misunderstanding something. > > Example : > > import numpy as np > age = np.random.randint(0, 99, 10e6) > weight = np.random.randint(0, 200, 10e6) > data = np.rec.fromarrays((age, weight), names='age, weight') > # the kind of operations I do is : > data.age += data.age + 1 > # but it's far less efficient than doing : > age += 1 > # because I think the record array stores [(age_0, weight_0) ...(age_n, > weight_n)] > # and not [age0 ... age_n] then [weight_0 ... weight_n]. > > So I think I don't use record arrays for the right purpose. I only need > something which would make me esasy to manipulate data by accessing > fields by their name. > > Am I wrong ? Is their something in numpy for my purpose ? Do I have to > implement my own class, with something like : > > > class FieldArray: > def __init__(self, array_dict): > self.array_list = array_dict > > def __getitem__(self, field): > return self.array_list[field] > > def __setitem__(self, field, value): > self.array_list[field] = value > > my_arrays = {'age': age, 'weight' : weight} > data = FieldArray(my_arrays) > > data['age'] += 1 You can accomplish what your FieldArray class does using numpy dtypes: import numpy as np dt = np.dtype([('age', np.int32), ('weight', np.int32)]) N = int(10e6) data = np.empty(N, dtype=dt) data['age'] = np.random.randint(0, 99, 10e6) data['weight'] = np.random.randint(0, 200, 10e6) data['age'] += 1 Timing for recarrays (your code): In [10]: timeit data.age += 1 10 loops, best of 3: 221 ms per loop Timing for my example: In [2]: timeit data['age']+=1 10 loops, best of 3: 150 ms per loop Hope this helps. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From pgmdevlist at gmail.com Mon Dec 29 12:41:47 2008 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 29 Dec 2008 12:41:47 -0500 Subject: [Numpy-discussion] Alternative to record array In-Reply-To: <602202.94406.qm@web28502.mail.ukl.yahoo.com> References: <602202.94406.qm@web28502.mail.ukl.yahoo.com> Message-ID: Jean-Baptiste, As you stated, everything depends on what you want to do. If you need to keep the correspondence age<>weight for each entry, then yes, record arrays, or at least flexible-type arrays, are the best. (The difference between a recarray and a flexible-type array is that fields can be accessed by attributes (data.age) or items (data['age']) with recarrays, but only with items with felxible-type arrays). Using your example, you could very well do: data['age'] += 1 and still keep the correspondence age<>weight. Your FieldArray class returns an object that is not a ndarray, which may have some undesired side-effects. As Ryan noted, flexible-type arrays are usually faster, because they lack the overhead brought by the possibiity of accessing data by attributes. So, if you don't mind using the 'access-by-fields' syntax, you're good to go. On Dec 29, 2008, at 10:58 AM, Jean-Baptiste Rudant wrote: > Hello, > > I like to use record arrays to access fields by their name, and > because they are esay to use with pytables. But I think it's not > very effiicient for what I have to do. Maybe I'm misunderstanding > something. > > Example : > > import numpy as np > age = np.random.randint(0, 99, 10e6) > weight = np.random.randint(0, 200, 10e6) > data = np.rec.fromarrays((age, weight), names='age, weight') > # the kind of operations I do is : > data.age += data.age + 1 > # but it's far less efficient than doing : > age += 1 > # because I think the record array stores [(age_0, weight_0) ... > (age_n, weight_n)] > # and not [age0 ... age_n] then [weight_0 ... weight_n]. > > So I think I don't use record arrays for the right purpose. I only > need something which would make me esasy to manipulate data by > accessing fields by their name. > > Am I wrong ? Is their something in numpy for my purpose ? Do I have > to implement my own class, with something like : > > > class FieldArray: > def __init__(self, array_dict): > self.array_list = array_dict > > def __getitem__(self, field): > return self.array_list[field] > > def __setitem__(self, field, value): > self.array_list[field] = value > > my_arrays = {'age': age, 'weight' : weight} > data = FieldArray(my_arrays) > > data['age'] += 1 > > Thank you for the help, > > Jean-Baptiste Rudant > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From lpc at cmu.edu Mon Dec 29 14:51:48 2008 From: lpc at cmu.edu (Luis Pedro Coelho) Date: Mon, 29 Dec 2008 14:51:48 -0500 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code Message-ID: <200812291451.49050.lpc@cmu.edu> Hello, I coincidently started my own implementation of a system to manage intermediate results last week, which I called jug. I wasn't planning to make such an alpha version public just now, but it seems to be on topic. The main idea is to use hashes to map function arguments to paths on the filesystem, which store the result (nothing extraordinary here). I also added the capability of having tasks (the basic unit) take the results of other tasks and defining an implicit dependency DAG. A simple locking mechanism enables light-weight task-level parellization (this was the second of my goals: help me make my stuff parallel). A trick that helps is that I don't really use the argument values to hash (which would be unwieldy for big arrays). I use the computation path (e.g., this is the value obtained from f(g('something'),2)). Since, at least in my problems, things tend to always map back into simple file-system paths, the hash computation doesn't even need to load the intermediate results. I will make the git repository publicly available once I figure out how to do that. I append the tutorial I wrote, which explains the system. HTH, Lu?s Pedro Coelho PhD Student in Computational Biology Carnegie Mellon University ============ Jug Tutorial ============ What is jug? ------------ Jug is a simple way to write easily parallelisable programs in Python. It also handles intermediate results for you. Example ------- This is a simple worked-through example which illustrates what jug does. Problem ~~~~~~~ Assume that I want to do the following to a collection of images: (1) for each image, compute some features (2) cluster these features using k-means. In order to find out the number of clusters, I try several values and pick the best result. For each value of k, because of the random initialisation, I run the clustering 10 times. I could write the following simple code: :: imgs = glob('*.png') features = [computefeatures(img,parameter=2) for img in imgs] clusters = [] bics = [] for k in xrange(2,200): for repeat in xrange(10): clusters.append(kmeans(features,k=k,random_seed=repeat)) bics.append(compute_bic(clusters[-1])) Nr_clusters = argmin(bics) // 10 Very simple and solves the problem. However, if I want to take advantage of the obvious parallelisation of the problem, then I need to write much more complicated code. My traditional approach is to break this down into smaller scripts. I'd have one to compute features for some images, I'd have another to merge all the results together and do some of the clustering, and, finally, one to merge all the results of the different clusterings. These would need to be called with different parameters to explore different areas of the parameter space, so I'd have a couple of scripts just for calling the main computation scripts. Intermediate results would be saved and loaded by the different processes. This has several problems. The biggest are (1) The need to manage intermediate files. These are normally files with long names like *features_for_img_0_with_parameter_P.pp*. (2) The code gets much more complex. There are minor issues with having to issue several jobs (and having the cluster be idle in the meanwhile), or deciding on how to partition the jobs so that they take roughly the same amount of time, but the two above are the main ones. Jug solves all these problems! Tasks ~~~~~ The main unit of jug is a Task. Any function can be used to generate a Task. A Task can depend on the results of other Tasks. The original idea for jug was a Makefile-like environment for declaring Tasks. I have moved beyond that, but it might help you think about what Tasks are. You create a Task by giving it a function which performs the work and its arguments. The arguments can be either literal values or other tasks (in which case, the function will be called with the *result* of those tasks!). Jug also understands lists of tasks (all standard Python containers will be supported in a later version). For example, the following code declares the necessary tasks for our problem: :: imgs = glob('*.png') feature_tasks = [Task(computefeatures,img,parameter=2) for img in imgs] cluster_tasks = [] bic_tasks = [] for k in xrange(2,200): for repeat in xrange(10): cluster_tasks.append(Task(kmeans,feature_tasks,k=k,random_seed=repeat)) bic_tasks.append(Task(compute_bic,cluster_tasks[-1])) Nr_clusters = Task(argmin,bic_tasks) Task Generators ~~~~~~~~~~~~~~~ In the code above, there is a lot of code of the form *Task(function,args)*, so maybe it should read *function(args)*. A simple helper function aids this process: :: from jug.task import Task def TaskGenerator(function): def gen(*args,**kwargs): return Task(function,*args,**kwargs) return gen computefeatures = TaskGenerator(computefeatures) kmeans = TaskGenerator(kmeans) compute_bic = TaskGenerator(compute_bic) @TaskGenerator def Nr_Clusters(bics): return argmin(bics) // 10 imgs = glob('*.png') features = [computefeatures(img,parameter=2) for img in imgs] clusters = [] bics = [] for k in xrange(2,200): for repeat in xrange(10): clusters.append(kmeans(features,k=k,random_seed=repeat)) bics.append(compute_bic(clusters[-1])) Nr_clusters(bics) You can see that this code is almost identical to our original sequential code, except for the declarations at the top and the fact that *Nr_clusters* is now a function (actually a TaskGenerator, look at the use of a declarator). This file is called the jugfile (you should name it *jugfile.py* on the filesystem) and specifies your problem. Of course, *TaskManager* is already a part of jug and those first few lines could have read :: from jug.task import TaskGenerator Jug ~~~ So far, we have achieved seemingly little. We have turned a simple piece of sequential code into something that generates Task objects, but does not actually perform any work. The final piece is jug. Jug takes these Task objects and runs them. It's main loop is basically :: while len(tasks) > 0: for t in tasks: if can_run(t): # ensures that all dependencies have been run if need_to_run(t) and not is_running(t): t.run() tasks.remove(t) If you run jug on the script above, you will simply have reproduced the original code with the added benefit of having all the intermediate results saved. The interesting is what happens when you run several instances of jug at the same time. They will start running Tasks, but each instance will run its own tasks. This allows you to take advantage of multiple processors in a way that keeps the processors all occupied as long as there is work to be done, handles the implicit dependencies, and passes functions the right values. Note also that, unlike more traditional parallel processing frameworks (like MPI), jug has no problems with the number of participating processors varying throughout the job. Behind the scenes, jug is using the filesystem to both save intermediate results (which get passed around) and to lock running tasks so that each task is only run once (the actual main loop is thus a bit more complex than shown above). Intermediate and Final Results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can obtain the final results of your computation by setting up a task that saves them to disk and loading them from there. If the results of your computation are simple enough, this might be the simplest way. Another way, which is also the way to access the intermediate results if you want them, is to run the jug script and then call the *load()* method on Tasks. For example, :: img = glob('*.png') features = [computefeatures(img,parameter=2) for img in imgs] ... feature_values = [feat.load() for feat in features] If the values are not accessible, this raises an exception. Advantages ---------- jug is an attempt to get something that works in the setting that I have found myself in: code that is *embarissingly parallel* with a couple of points where all the results of previous processing are merged, often in a simple way. It is also a way for me to manage either the explosion of temporary files that plagued my code and the brittleness of making sure that all results from separate processors are merged correctly in my *ad hoc* scripts. Limitations ----------- This is not an attempt to replace MPI in any way. For code that has more merge points, this won't do. It also won't do if the individual tasks are so small that the over-head of managing them swamps out the performance gains of parallelisation. In my code, most of the times, each task takes 20 seconds to a few minutes. Just enough to make the managing time irrelevant, but fast enough that the main job can be broken into thousands of tiny pieces. The system makes it too easy to save all intermediate results and run out of disk space. This is still Python, not a true parallel programming language. The abstraction will sometimes leak through, for example, if you try to pass a Task to a function which expects a real value. Recall how we had to re-write the line *Nr_clusters = argmin(bics) // 10* above. Planned Capabilities -------------------- Here are a couple of simple improvements I plan to make at some point: * jug.py cleanup: removes left-over locks, temporary files, and unsused results. * Stop & re-start. Currently, jug processes will exit if they can't make any progress for a while. In the future, I'd like them to be unblockable by other jug processes. * No result tasks. Task-like objects that don't save intermediate results. * Have tasks be passed inside *sets* and *dictionaries*. Maybe even *numpy* arrays! This will make jug even more like a real parallel programming language. * If the original arguments are files on disk, then jug should check their modification date and invalidate subsequent results. From lpc at cmu.edu Mon Dec 29 16:41:35 2008 From: lpc at cmu.edu (Luis Pedro Coelho) Date: Mon, 29 Dec 2008 16:41:35 -0500 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code In-Reply-To: <200812291451.49050.lpc@cmu.edu> References: <200812291451.49050.lpc@cmu.edu> Message-ID: <200812291641.35576.lpc@cmu.edu> On Monday 29 December 2008 14:51:48 Luis Pedro Coelho wrote: > I will make the git repository publicly available once I figure out how to > do that. You can get my code with: git clone http://coupland.cbi.cmu.edu/jug As I said, I consider this alpha code and am only making it publicly available at this stage because it came up. The license is LGPL. bye, Luis From zachary.pincus at yale.edu Mon Dec 29 16:49:32 2008 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 29 Dec 2008 16:49:32 -0500 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code In-Reply-To: <200812291641.35576.lpc@cmu.edu> References: <200812291451.49050.lpc@cmu.edu> <200812291641.35576.lpc@cmu.edu> Message-ID: <42595048-C877-46C4-AA07-B2CB18B81B24@yale.edu> This looks really cool -- thanks Luis. Definitely keep us posted as this progresses, too. Zach On Dec 29, 2008, at 4:41 PM, Luis Pedro Coelho wrote: > On Monday 29 December 2008 14:51:48 Luis Pedro Coelho wrote: >> I will make the git repository publicly available once I figure out >> how to >> do that. > > You can get my code with: > > git clone http://coupland.cbi.cmu.edu/jug > > As I said, I consider this alpha code and am only making it publicly > available > at this stage because it came up. The license is LGPL. > > bye, > Luis > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion From gael.varoquaux at normalesup.org Mon Dec 29 17:40:07 2008 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 29 Dec 2008 23:40:07 +0100 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code In-Reply-To: <200812291451.49050.lpc@cmu.edu> References: <200812291451.49050.lpc@cmu.edu> Message-ID: <20081229224007.GA12811@phare.normalesup.org> Hi Luis, On Mon, Dec 29, 2008 at 02:51:48PM -0500, Luis Pedro Coelho wrote: > I coincidently started my own implementation of a system to manage > intermediate results last week, which I called jug. I wasn't planning > to make such an alpha version public just now, but it seems to be on > topic. Thanks for your input. This comforts me in my hunch that these problems where universal. It is interesting to see that you take a slightly different approach than the others already discussed. This probably stems from the fact that you are mostly interested by parallelism, whereas there are other adjacent problems that can be solved by similar abstractions. In particular, I have the impression that you do not deal with what I call "lazy-revaluation". In other words, I am not sure if you track results enough to know whether a intermediate result should be re-run, or if you run a 'clean' between each run to avoid this problem. I must admit I went away from using hash to store objects to the disk because I am very much interested in traceability, and I wanted my objects to have meaningful names, and to be stored in convenient formats (pickle, numpy .npy, hdf5, or domain-specific). I have now realized that explicit naming is convenient, but it should be optional. Your task-based approach, and the API you have built around it, reminds my a bit of twisted deferred. Have you studied this API? > A trick that helps is that I don't really use the argument values to hash > (which would be unwieldy for big arrays). I use the computation path (e.g., > this is the value obtained from f(g('something'),2)). Since, at least in my > problems, things tend to always map back into simple file-system paths, the > hash computation doesn't even need to load the intermediate results. I did notice too that using the argument value to hash was bound to failure in all but the simplest case. This is the immediate limitation to the famous memoize pattern when applied to scientific code. If I understand well, what you do is that you track the 'history' of the object and use it as a hash to the object, right? I had come to the conclusion that the history of objects should be tracked, but I hadn't realized that using it as a hash was also a good way to solve the scoping problem. Thanks for the trick. Would you consider making the code BSD? Because I want to be able to reuse my code in non open-source project, and because I do not want to lock out contributors, or to ask for copyright assignment, I like to keep all my code BSD, as all the mainstream scientific Python projects. I'll start writing up a wiki page with the all the different learning and usecases that come from all this interesting feedback. Cheers, Ga?l From faltet at pytables.org Mon Dec 29 18:00:09 2008 From: faltet at pytables.org (Francesc Alted) Date: Tue, 30 Dec 2008 00:00:09 +0100 Subject: [Numpy-discussion] Alternative to record array In-Reply-To: <602202.94406.qm@web28502.mail.ukl.yahoo.com> References: <602202.94406.qm@web28502.mail.ukl.yahoo.com> Message-ID: <200812300000.10365.faltet@pytables.org> A Monday 29 December 2008, Jean-Baptiste Rudant escrigu?: > Hello, > > I like to use record arrays to access fields by their name, and > because they are esay to use with pytables. But I think it's not very > effiicient for what I have to do. Maybe I'm misunderstanding > something. > > Example : > > import numpy as np > age = np.random.randint(0, 99, 10e6) > weight = np.random.randint(0, 200, 10e6) > data = np.rec.fromarrays((age, weight), names='age, weight') > # the kind of operations I do is : > data.age += data.age + 1 > # but it's far less efficient than doing : > age += 1 > # because I think the record array stores [(age_0, weight_0) > ...(age_n, weight_n)] # and not [age0 ... age_n] then [weight_0 ... > weight_n]. > > So I think I don't use record arrays for the right purpose. I only > need something which would make me esasy to manipulate data by > accessing fields by their name. > > Am I wrong ? Is their something in numpy for my purpose ? Do I have > to implement my own class, with something like : > > > > class FieldArray: > def __init__(self, array_dict): > self.array_list = array_dict > > def __getitem__(self, field): > return self.array_list[field] > > def __setitem__(self, field, value): > self.array_list[field] = value > > my_arrays = {'age': age, 'weight' : weight} > data = FieldArray(my_arrays) > > data['age'] += 1 That's a very good question. What you are observing are the effects of arranging a dataset by fields (row-wise) or by columns (column-wise). A record array in numpy arranges data by field, so that in your 'data' array the data is placed in memory as follows: data['age'][0] --> data['weight'][0] --> data['age'][1] --> data['weight'][1] --> ... while in your 'FieldArray' class, data is arranged by column and is placed in memory as: data['age'][0] --> data['age'][1] --> ... --> data['weight'][0] --> data['weight'][1] --> ... The difference for both approaches is that the row-wise arrangement is more efficient when data is iterated by field, while the column-wise one is more efficient when data is iterated by column. This is why you are seeing the increase of 4x in performance --incidentally, by looking at both data arrangements, I'd expect an increase of just 2x (the stride count is 2 in this case), but I suspect that there are hidden copies during the increment operation for the record array case. So you are perfectly right. In some situations you may want to use a row-wise arrangement (record array) and in other situations a column-wise one. So, it would be handy to have some code to convert back and forth between both data arrangements. Here it goes a couple of classes for doing this (they are a quick-and-dirty generalization of your code): class ColArray: def __init__(self, recarray): dictarray = {} if isinstance(recarray, np.ndarray): fields = recarray.dtype.fields elif isinstance(recarray, RecArray): fields = recarray.fields else: raise TypeError, "Unrecognized input type!" for colname in fields: # For optimum performance you should 'copy' the column! dictarray[colname] = recarray[colname].copy() self.dictarray = dictarray def __getitem__(self, field): return self.dictarray[field] def __setitem__(self, field, value): self.dictarray[field] = value def iteritems(self): return self.dictarray.iteritems() class RecArray: def __init__(self, dictarray): ldtype = [] fields = [] for colname, column in dictarray.iteritems(): ldtype.append((colname, column.dtype)) fields.append(colname) collen = len(column) dt = np.dtype(ldtype) recarray = np.empty(collen, dtype=dt) for colname, column in dictarray.iteritems(): recarray[colname] = column self.recarray = recarray self.fields = fields def __getitem__(self, field): return self.recarray[field] def __setitem__(self, field, value): self.recarray[field] = value So, ColArray takes as parameter a record array or RecArray class that have a row-wise arrangement and returns an object that is column-wise. RecArray does the inverse trip on the ColArray that takes as parameter. A small example of use: N = 10e6 age = np.random.randint(0, 99, N) weight = np.random.randint(0, 200, N) # Get an initial record array dt = np.dtype([('age', np.int_), ('weight', np.int_)]) data = np.empty(N, dtype=dt) data['age'] = age data['weight'] = weight t1 = time() data['age'] += 1 print "time for initial recarray:", round(time()-t1, 3) data = ColArray(data) t1 = time() data['age'] += 1 print "time for ColArray:", round(time()-t1, 3) data = RecArray(data) t1 = time() data['age'] += 1 print "time for reconstructed RecArray:", round(time()-t1, 3) data = ColArray(data) t1 = time() data['age'] += 1 print "time for reconstructed ColArray:", round(time()-t1, 3) and the output is: time for initial recarray: 0.298 time for ColArray: 0.076 time for reconstructed RecArray: 0.3 time for reconstructed ColArray: 0.076 So, these classes offers a quick way to go back and forth between both data arrangements, and can be used whenever a representation is found to be more useful. Indeed, you must be aware that the conversion takes time, and that it is generally a bad idea to do it just to do an operation. But when you must to operate a lot, a conversion makes a lot of sense. In fact, my hunch is that the column-wise arrangement is far more useful in general for accelerating operations in heterogeneous arrays, because what people normally do is operating column-wise and not row-wise. If this is actually the case, it would be a good idea to introduce a first-class type in numpy implementing a column-wise heterogeneous array. If this is found to be too cumbersome, perhaps integrating some utilities to do the conversion (similar in spirit to the classes above), would fit the bill. Cheers, -- Francesc Alted From lpc at cmu.edu Mon Dec 29 18:25:05 2008 From: lpc at cmu.edu (Luis Pedro Coelho) Date: Mon, 29 Dec 2008 18:25:05 -0500 Subject: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code In-Reply-To: <20081229224007.GA12811@phare.normalesup.org> References: <200812291451.49050.lpc@cmu.edu> <20081229224007.GA12811@phare.normalesup.org> Message-ID: <200812291825.07230.lpc@cmu.edu> Hello all, On Monday 29 December 2008 17:40:07 Gael Varoquaux wrote: > It is interesting to see that you take a slightly different approach than > the others already discussed. This probably stems from the fact that you > are mostly interested by parallelism, whereas there are other adjacent > problems that can be solved by similar abstractions. In particular, I > have the impression that you do not deal with what I call > "lazy-revaluation". In other words, I am not sure if you track results > enough to know whether a intermediate result should be re-run, or if you > run a 'clean' between each run to avoid this problem. I do. As long as the hash (the arguments to the function) is the same, the code loads objects from disk instead of computing results. I don't track the actual source code, though, only whether parameters have changed (but this could be a later addition). > I must admit I went away from using hash to store objects to the disk > because I am very much interested in traceability, and I wanted my > objects to have meaningful names, and to be stored in convenient formats > (pickle, numpy .npy, hdf5, or domain-specific). I have now realized that > explicit naming is convenient, but it should be optional. But using a hash is not so impenetrable as long as you can easily get to the files you want. If I want to load the results of a partial computation, all I have to do is to generate the same Task objects as the initial computation and load those: I can run the jugfile.py inside ipython and call the appropriate load() methods. ipython jugfile.py : interesting = [t for t in tasks if t.name == 'something.other'] : intermediate = interesting[0].load() > I did notice too that using the argument value to hash was bound to > failure in all but the simplest case. This is the immediate limitation to > the famous memoize pattern when applied to scientific code. If I > understand well, what you do is that you track the 'history' of the > object and use it as a hash to the object, right? I had come to the > conclusion that the history of objects should be tracked, but I hadn't > realized that using it as a hash was also a good way to solve the scoping > problem. Thanks for the trick. Yes, let's say I have the following: feats = [Task(features,img) for img in glob('*.png')] cluster = Task(kmeans,feats,k=10) then the hash for cluster is computed from its arguments: * kmeans : the function name * feats: this is a list of tasks, therefore I use its hash, which is defined by its argument, which is a simple string. * k=10: this is a literal. I don't need to use the value computed by feats to compute the hash for cluster. > Your task-based approach, and the API you have built around it, reminds > my a bit of twisted deferred. Have you studied this API? No. I will look into it. Thanks. bye, Luis From cournape at gmail.com Mon Dec 29 22:12:32 2008 From: cournape at gmail.com (David Cournapeau) Date: Tue, 30 Dec 2008 12:12:32 +0900 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> <495861B6.8050707@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220812291912o4dd75e73ka151cfbb82ed58@mail.gmail.com> On Mon, Dec 29, 2008 at 4:36 PM, Charles R Harris wrote: > > > On Sun, Dec 28, 2008 at 10:35 PM, David Cournapeau > wrote: >> >> Charles R Harris wrote: >> > >> > >> > >> > I put my yesterday work in the fix_float_format branch: >> > - it fixes the locale issue >> > - it fixes the long double issue on windows. >> > - it also fixes some tests (we were not testing single precision >> > formatting but twice double precision instead - the single precision >> > test fails on the trunk BTW). >> > >> > >> > Curious, I don't see any test failures here. Were the tests actually >> > being run or is something else different in your test setup? Or do you >> > mean the fixed up test fails. >> >> The later: if you look at numpy/core/tests/test_print, you will see that >> the types tested are np.float, np.double and np.longdouble, but at least >> on linux, np.float == np.double, and np.float32 is what we want to test >> I suppose here instead. >> >> > >> > Expected, but I would like to see it change because it is kind of >> > frustrating. Fixing it probably involves setting a function pointer in >> > the type definition but I am not sure about that. >> >> Hm, it took me a while to get this, but print np.float32(value) can be >> controlled through tp_print. Still, it does not work in all cases: >> >> print np.float32(a) -> call the tp_print >> print '%f' % np.float32(a) -> does not call the tp_print (nor >> tp_str/tp_repr). I have no idea what going on there. > > I'll bet it's calling a conversion to python float, i.e., double, because of > the %f. Yes, I meant that I did not understand the code path in that case. I realize that I don't know how to get the (C) call graph between two code points in python, that would be useful. Where are you dtrace on linux when I need you :) > > In [1]: '%s' % np.float32(1) > Out[1]: '1.0' > > In [2]: '%f' % np.float32(1) > Out[2]: '1.000000' > > I don't see any way to work around that without changing the way the python > formatting works. Yes, I think you're right. Specially since python itself is not consistent. On python 2.6, windows: a = complex('inf') print a # -> print inf print '%s' % a # -> print inf print '%f' % a # -> print 1.#INF Which suggests that in that case, it gets directly to stdio without much formatting work from python. Maybe it is an oversight ? Anyway, I think it would be useful to override the tp_print member ( to avoid 'print a' printing 1.#INF). cheers, David From charlesr.harris at gmail.com Mon Dec 29 23:26:52 2008 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 29 Dec 2008 21:26:52 -0700 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <5b8d13220812291912o4dd75e73ka151cfbb82ed58@mail.gmail.com> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> <495861B6.8050707@ar.media.kyoto-u.ac.jp> <5b8d13220812291912o4dd75e73ka151cfbb82ed58@mail.gmail.com> Message-ID: On Mon, Dec 29, 2008 at 8:12 PM, David Cournapeau wrote: > On Mon, Dec 29, 2008 at 4:36 PM, Charles R Harris > wrote: > > > > > > On Sun, Dec 28, 2008 at 10:35 PM, David Cournapeau > > wrote: > >> > >> Charles R Harris wrote: > >> > > >> > > >> > > >> > I put my yesterday work in the fix_float_format branch: > >> > - it fixes the locale issue > >> > - it fixes the long double issue on windows. > >> > - it also fixes some tests (we were not testing single precision > >> > formatting but twice double precision instead - the single > precision > >> > test fails on the trunk BTW). > >> > > >> > > >> > Curious, I don't see any test failures here. Were the tests actually > >> > being run or is something else different in your test setup? Or do you > >> > mean the fixed up test fails. > >> > >> The later: if you look at numpy/core/tests/test_print, you will see that > >> the types tested are np.float, np.double and np.longdouble, but at least > >> on linux, np.float == np.double, and np.float32 is what we want to test > >> I suppose here instead. > >> > >> > > >> > Expected, but I would like to see it change because it is kind of > >> > frustrating. Fixing it probably involves setting a function pointer in > >> > the type definition but I am not sure about that. > >> > >> Hm, it took me a while to get this, but print np.float32(value) can be > >> controlled through tp_print. Still, it does not work in all cases: > >> > >> print np.float32(a) -> call the tp_print > >> print '%f' % np.float32(a) -> does not call the tp_print (nor > >> tp_str/tp_repr). I have no idea what going on there. > > > > I'll bet it's calling a conversion to python float, i.e., double, because > of > > the %f. > > Yes, I meant that I did not understand the code path in that case. I > realize that I don't know how to get the (C) call graph between two > code points in python, that would be useful. Where are you dtrace on > linux when I need you :) > I'm not sure we are quite on the same page here. The float32 object has a "convert to python float" method, (which I don't recall at the moment and I don't have the source to hand). So when %f appears in the format string that method is called and the resulting python float is formatted in the python way. Same with %s, only __str__ is called instead. > > > > > In [1]: '%s' % np.float32(1) > > Out[1]: '1.0' > > > > In [2]: '%f' % np.float32(1) > > Out[2]: '1.000000' > > > > I don't see any way to work around that without changing the way the > python > > formatting works. > > Yes, I think you're right. Specially since python itself is not > consistent. On python 2.6, windows: > > a = complex('inf') > print a # -> print inf > print '%s' % a # -> print inf > print '%f' % a # -> print 1.#INF > How does a python inf display on windows? > > Which suggests that in that case, it gets directly to stdio without > much formatting work from python. Maybe it is an oversight ? Anyway, I > think it would be useful to override the tp_print member ( to avoid > 'print a' printing 1.#INF). > Sounds like the sort of thing the python folks would want to clean up, just as you have for numpy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Mon Dec 29 23:46:30 2008 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 30 Dec 2008 13:46:30 +0900 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> <495861B6.8050707@ar.media.kyoto-u.ac.jp> <5b8d13220812291912o4dd75e73ka151cfbb82ed58@mail.gmail.com> Message-ID: <4959A7A6.3020805@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > > Yes, I meant that I did not understand the code path in that case. I > realize that I don't know how to get the (C) call graph between two > code points in python, that would be useful. Where are you dtrace on > linux when I need you :) > > > I'm not sure we are quite on the same page here. Yep, indeed. I think my bogus example did not help :) The right test script use float('inf'), not complex('inf'). > The float32 object has a "convert to python float" method, (which I > don't recall at the moment and I don't have the source to hand). So > when %f appears in the format string that method is called and the > resulting python float is formatted in the python way. I think that's not the case for '%f', because the 'python' way is to print 'inf', not '1.#INF' (at least on 2.6 - on 2.5, it is always '1.#INF' on windows). If you use a pure C program on windows, you will get '1.#INF', etc... instead of 'inf'. repr, str, print all call the C format_float function, which takes care of fomatting 'inf' and co the 'python' way. So getting '1.#INF' from python suggests me that python does not format it in the '%f' case - and I don't know the code path at that point. For '%s', it goes through tp_str, for print a, it goes through tp_print, but for '%f' ? > > > a = complex('inf') > print a # -> print inf > print '%s' % a # -> print inf > print '%f' % a # -> print 1.#INF > > > How does a python inf display on windows? As stated: it depends. 'inf' or '1.#INF', the later being the same as the formatting done within the MS runtime. > > > > Which suggests that in that case, it gets directly to stdio without > much formatting work from python. Maybe it is an oversight ? Anyway, I > think it would be useful to override the tp_print member ( to avoid > 'print a' printing 1.#INF). > > > Sounds like the sort of thing the python folks would want to clean up, > just as you have for numpy. The thing is since I don't understand what happens in the print '%f' case, I don't know how to clean it up, if it is at all possible. But in anyway, it means that with my changes, we are not worse than python itself, and I think we are better than before, cheers, David From faltet at pytables.org Tue Dec 30 10:34:27 2008 From: faltet at pytables.org (Francesc Alted) Date: Tue, 30 Dec 2008 16:34:27 +0100 Subject: [Numpy-discussion] Alternative to record array In-Reply-To: <200812300000.10365.faltet@pytables.org> References: <602202.94406.qm@web28502.mail.ukl.yahoo.com> <200812300000.10365.faltet@pytables.org> Message-ID: <200812301634.28218.faltet@pytables.org> A Tuesday 30 December 2008, Francesc Alted escrigu?: > A Monday 29 December 2008, Jean-Baptiste Rudant escrigu?: [snip] > > The difference for both approaches is that the row-wise arrangement > is more efficient when data is iterated by field, while the > column-wise one is more efficient when data is iterated by column. > This is why you are seeing the increase of 4x in performance > --incidentally, by looking at both data arrangements, I'd expect an > increase of just 2x (the stride count is 2 in this case), but I > suspect that there are hidden copies during the increment operation > for the record array case. As I was mystified about this difference in speed, I kept investigating and I think I have an answer for the difference in the expected speed-up in the unary increment operator over a recarray field. After looking at the numpy code, it turns out that the next statement: data.ages += 1 is more or less equivalent to: a = data.ages a[:] = a + 1 i.e. a temporary is created (for keeping the result of 'a + 1') and then assigned to the 'ages' column. As it happens that, in this sort of operations, the memory copies are the bottleneck, the creation of the first temporary introduced a slowdown of 2x (due to the strided column) and the assignment represents the additional 2x (4x in total). However, the next idiom: a = data.ages a += 1 effectively removes the need for the temporary copy and is 2x faster than the original "data.ages += 1". This can be seen in the next simple benchmark: --------------------------- import numpy, timeit count = 10e6 ages = numpy.random.randint(0,100,count) weights = numpy.random.randint(1,200,count) data = numpy.rec.fromarrays((ages,weights),names='ages,weights') timer = timeit.Timer('data.ages += 1','from __main__ import data') print "v0-->", timer.timeit(number=10) timer = timeit.Timer('a=data.ages; a[:] = a + 1','from __main__ import data') print "v1-->", timer.timeit(number=10) timer = timeit.Timer('a=data.ages; a += 1','from __main__ import data') print "v2-->", timer.timeit(number=10) timer = timeit.Timer('ages += 1','from __main__ import ages') print "v3-->", timer.timeit(number=10) --------------------------- which produces the next output on my laptop: v0--> 2.98340201378 v1--> 3.22748112679 v2--> 1.5474319458 v3--> 0.809724807739 As a final comment, I suppose that unary operators (+=, -=...) can be optimized in the context of recarray columns in numpy, but I don't think it is worth the effort: when really high performance is needed for operating with columns in the context of recarrays, a column-wise approach is best. Cheers, -- Francesc Alted From nick at matsakis.net Tue Dec 30 12:44:47 2008 From: nick at matsakis.net (Nicholas Matsakis) Date: Tue, 30 Dec 2008 12:44:47 -0500 (EST) Subject: [Numpy-discussion] numpy.test() failures (1.2.1) on Mac OS X Message-ID: I just installed what I believe to be a completely vanilla installation of numpy on an Intel Mac OS X 10.5.6. Python 2.5 pkg from Python.org, numpy 1.2.1 pkg from scipy.org, nose installed through setup tools. Running "import numpy; numpy.test()" results in the following errors and failures: ERROR: Failure: TypeError (can't multiply sequence by non-int of type 'float') ERROR: test_definition (test_helper.TestFFTShift) ERROR: test_inverse (test_helper.TestFFTShift) ERROR: Test of inplace division FAIL: test_division_int (test_umath.TestDivision) FAIL: test_basic (test_index_tricks.TestUnravelIndex) FAIL: Test of inplace division FAIL: test_inplace_division_misc (test_core.TestMaskedArrayInPlaceArithmetics) FAIL: Test of inplace operations and rich comparisons Is this expected? Should I file a ticket? A complete dump of the test run can be found at: http://nick.matsakis.net/tmp/numpy-1.2.1-tests-12-30-2008.txt Nick Matsakis From lists.20.chth at xoxy.net Tue Dec 30 13:10:33 2008 From: lists.20.chth at xoxy.net (ctw) Date: Tue, 30 Dec 2008 13:10:33 -0500 Subject: [Numpy-discussion] combining recarrays Message-ID: Hi! I'm a bit stumped by the following: suppose I have several recarrays with identical dtypes (identical field names, etc.) and would like to combine them into one rec array, what would be the best way to do that? I tried using np.rec.fromrecords, but that doesn't produce the desired result. As a minimal example consider the following code: desc = np.dtype({'names':['a','b'],'formats':[np.float,np.int]}) rec1 = np.zeros(3,desc) rec2 = np.zeros(3,desc) Now I have two recarrays of shape (3,) that both look like this: array([(0.0, 0), (0.0, 0), (0.0, 0)], dtype=[('a', ' References: Message-ID: On Tue, Dec 30, 2008 at 10:10 AM, ctw wrote: > Hi! > > I'm a bit stumped by the following: suppose I have several recarrays > with identical dtypes (identical field names, etc.) and would like to > combine them into one rec array, what would be the best way to do > that? I tried using np.rec.fromrecords, but that doesn't produce the > desired result. As a minimal example consider the following code: > > desc = np.dtype({'names':['a','b'],'formats':[np.float,np.int]}) > rec1 = np.zeros(3,desc) > rec2 = np.zeros(3,desc) > > Now I have two recarrays of shape (3,) that both look like this: > array([(0.0, 0), (0.0, 0), (0.0, 0)], > dtype=[('a', ' > I would like to turn them into one new recarray of shape (6,) that > looks like this: > array([(0.0, 0), (0.0, 0), (0.0, 0), (0.0, 0), (0.0, 0), (0.0, 0)], > dtype=[('a', ' > Any ideas? I'm not familiar with rec arrays, but this should work for any array: >> np.r_[rec1, rec2] array([(0.0, 0), (0.0, 0), (0.0, 0), (0.0, 0), (0.0, 0), (0.0, 0)], dtype=[('a', '> np.concatenate((rec1, rec2), 0) array([(0.0, 0), (0.0, 0), (0.0, 0), (0.0, 0), (0.0, 0), (0.0, 0)], dtype=[('a', ' References: Message-ID: On Tue, 30 Dec 2008, Nicholas Matsakis wrote: > I just installed what I believe to be a completely vanilla installation of > numpy on an Intel Mac OS X 10.5.6. Python 2.5 pkg from Python.org, numpy > 1.2.1 pkg from scipy.org, nose installed through setup tools. Running > "import numpy; numpy.test()" results in the following errors and > failures... I've determined these failures were a result of running the python interpreter with the "new" division semantics. Without those all the tests pass save one known failure. Nick Matsakis From len-l at telus.net Tue Dec 30 13:41:17 2008 From: len-l at telus.net (Lenard Lindstrom) Date: Tue, 30 Dec 2008 10:41:17 -0800 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <4959A7A6.3020805@ar.media.kyoto-u.ac.jp> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> <495861B6.8050707@ar.media.kyoto-u.ac.jp> <5b8d13220812291912o4dd75e73ka151cfbb82ed58@mail.gmail.com> <4959A7A6.3020805@ar.media.kyoto-u.ac.jp> Message-ID: <495A6B4D.70009@telus.net> David Cournapeau wrote: > The thing is since I don't understand what happens in the print '%f' > case, I don't know how to clean it up, if it is at all possible. But in > anyway, it means that with my changes, we are not worse than python > itself, and I think we are better than before, > > Just a quick look in SVN, trunk/Objects/stringobject.c, shows that the call path for a "%f" format is string_mod -> PyString_Format -> formatfloat -> PyOS_ascii_formatd. -- Lenard Lindstrom From Chris.Barker at noaa.gov Tue Dec 30 14:59:43 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 30 Dec 2008 11:59:43 -0800 Subject: [Numpy-discussion] "False" and "True" not singletons? Message-ID: <495A7DAF.9080504@noaa.gov> Hi all, I've just discovered that "False" is not a singleton: >>> import numpy as N >>> f = N.all((1,2,0)) >>> print f False >>> id(f) 17316364 >>> f is False False >>> id(False) 3294768 Should it be? This arose for me in some tests I'm using that check if a result is False: self.failUnless ( (self.B1 == self.B3) is False ) I'm doing it this way, because I want to make sure that the __eq__ method really is returning "True" or "False", rather than a value that happens to evaluate to true or false, like 1, or an empty list, or whatever. This is interesting to me because back when Python first introduced Booleans, I had thought they should be kept pure and not be subclasses of integers, and, in fact, "if" should only except boolean values. However this opinion was really a matter of my sense of purity, and this is the fist time I've run into a case that matters. I suppose I'm not being pythonic -- I should really only care if the result evaluates true or false, but I don't feel like I'm testing right if values can slip though that shouldn't. It does reinforce my opinion though -- whether zero, or an empty sequence or string should evaluate false really is a matter of specific application, not universal. Anyway, should I just give up? Or should numpy return the same "True" and "False" (like None), or is there another solution? x == False is close, but zero still slips through, even though an empty list does not: >>> 0 == False True >>> [] == False False -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From josef.pktd at gmail.com Tue Dec 30 16:22:26 2008 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 30 Dec 2008 16:22:26 -0500 Subject: [Numpy-discussion] "False" and "True" not singletons? In-Reply-To: <495A7DAF.9080504@noaa.gov> References: <495A7DAF.9080504@noaa.gov> Message-ID: <1cd32cbb0812301322n2595a5ecv2b4b27283154ba18@mail.gmail.com> On Tue, Dec 30, 2008 at 2:59 PM, Christopher Barker wrote: > Hi all, > > I've just discovered that "False" is not a singleton: > > >>> import numpy as N > > >>> f = N.all((1,2,0)) > >>> print f > False > >>> id(f) > 17316364 > >>> f is False > False > >>> id(False) > 3294768 > > > Should it be? > > > This arose for me in some tests I'm using that check if a result is False: > > self.failUnless ( (self.B1 == self.B3) is False ) > > I'm doing it this way, because I want to make sure that the __eq__ > method really is returning "True" or "False", rather than a value that > happens to evaluate to true or false, like 1, or an empty list, or whatever. > > This is interesting to me because back when Python first introduced > Booleans, I had thought they should be kept pure and not be subclasses > of integers, and, in fact, "if" should only except boolean values. > However this opinion was really a matter of my sense of purity, and this > is the fist time I've run into a case that matters. > > I suppose I'm not being pythonic -- I should really only care if the > result evaluates true or false, but I don't feel like I'm testing right > if values can slip though that shouldn't. > > It does reinforce my opinion though -- whether zero, or an empty > sequence or string should evaluate false really is a matter of specific > application, not universal. > > Anyway, should I just give up? Or should numpy return the same "True" > and "False" (like None), or is there another solution? > > x == False > > is close, but zero still slips through, even though an empty list does not: > > >>> 0 == False > True > >>> [] == False > False > > > -Chris > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > If you really insist on getting a boolean type, then you can check with isinstance: >>> if not np.any(np.ones(3)==0) and isinstance(np.any(np.ones(3)==0),np.bool_):print 'ok' ok >>> if not (1==0) and isinstance((1==0),bool):print 'ok' ok >>> f = np.all((1,2,0)) >>> if not f and isinstance(f,np.bool_):print 'ok' ok zero is not an instance of boolean: >>> f=0 >>> if not f and isinstance(f,np.bool_):print 'ok' >>> if not f and isinstance(f,bool):print 'ok' >>> Josef From robert.kern at gmail.com Tue Dec 30 16:33:52 2008 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 30 Dec 2008 16:33:52 -0500 Subject: [Numpy-discussion] "False" and "True" not singletons? In-Reply-To: <495A7DAF.9080504@noaa.gov> References: <495A7DAF.9080504@noaa.gov> Message-ID: <3d375d730812301333m5a39eca4o7ab8a7d270d01eba@mail.gmail.com> On Tue, Dec 30, 2008 at 14:59, Christopher Barker wrote: > Hi all, > > I've just discovered that "False" is not a singleton: > > >>> import numpy as N > > >>> f = N.all((1,2,0)) > >>> print f > False > >>> id(f) > 17316364 > >>> f is False > False > >>> id(False) > 3294768 > > > Should it be? Well, True and False are singletons, but numpy.any() and numpy.all() don't return bools. They return numpy.bool_s. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Tue Dec 30 17:17:30 2008 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 30 Dec 2008 14:17:30 -0800 Subject: [Numpy-discussion] "False" and "True" not singletons? In-Reply-To: <3d375d730812301333m5a39eca4o7ab8a7d270d01eba@mail.gmail.com> References: <495A7DAF.9080504@noaa.gov> <3d375d730812301333m5a39eca4o7ab8a7d270d01eba@mail.gmail.com> Message-ID: <495A9DFA.8000308@noaa.gov> Robert Kern wrote: > Well, True and False are singletons, I thought so. > but numpy.any() and numpy.all() > don't return bools. They return numpy.bool_s. Is that a numpy scalar type? This also begs the question: why don't they return regular old True and False? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pav at iki.fi Tue Dec 30 17:27:58 2008 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 30 Dec 2008 22:27:58 +0000 (UTC) Subject: [Numpy-discussion] "False" and "True" not singletons? References: <495A7DAF.9080504@noaa.gov> <3d375d730812301333m5a39eca4o7ab8a7d270d01eba@mail.gmail.com> <495A9DFA.8000308@noaa.gov> Message-ID: Tue, 30 Dec 2008 14:17:30 -0800, Christopher Barker wrote: > Robert Kern wrote: >> Well, True and False are singletons, > > I thought so. > >> but numpy.any() and numpy.all() >> don't return bools. They return numpy.bool_s. > > Is that a numpy scalar type? > > This also begs the question: why don't they return regular old True and > False? Genericity. np.all and np.any take an axis parameter and usually return ndarrays; the default value, axis=None, is a corner case. Returning array scalars (which mostly quack like ndarrays) allows one to avoid the need for special-casing any subsequent code for axis=None. -- Pauli Virtanen From pav at iki.fi Tue Dec 30 21:28:29 2008 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 31 Dec 2008 02:28:29 +0000 (UTC) Subject: [Numpy-discussion] formatting issues, locale and co References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> Message-ID: Mon, 29 Dec 2008 13:38:12 +0900, David Cournapeau wrote: [clip] > I put my yesterday work in the fix_float_format branch: > - it fixes the locale issue > - it fixes the long double issue on windows. > - it also fixes some tests (we were not testing single precision > formatting but twice double precision instead > - the single precision test fails on the trunk BTW). > - it handles inf and nan more consistently across platforms (e.g. > str(np.log(0)) will be '-inf' on all platforms; on windows, it used > to be '-1.#INF' > - I was afraid it would broke converting back the > string to float, but it is broken anyway before my change, e.g. > float('-1.#INF') does not work on windows). [clip] I did some work on the fix_float_format branch from the opposite direction, making fromfile and fromstring properly locale-independent. (cf. #884) Works currently on POSIX systems, but some tests fail on Windows because float('inf') does not work [neither does float('-1.#INF')...]. (cf. #510) A bit more work must be done on NumPyOS_ascii_strtod to make inf/nan work as intended. Also, roundtrip tests for repr would be nice to add, if they aren't there yet, and possibly for str <-> fromstring roundtrip, too. I'll be almost offline for 1.5 weeks starting now, so if you want to finish this, go ahead. -- Pauli Virtanen From cournape at gmail.com Tue Dec 30 23:06:59 2008 From: cournape at gmail.com (David Cournapeau) Date: Wed, 31 Dec 2008 13:06:59 +0900 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: <495A6B4D.70009@telus.net> References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> <495861B6.8050707@ar.media.kyoto-u.ac.jp> <5b8d13220812291912o4dd75e73ka151cfbb82ed58@mail.gmail.com> <4959A7A6.3020805@ar.media.kyoto-u.ac.jp> <495A6B4D.70009@telus.net> Message-ID: <5b8d13220812302006qb0d6c96m64a2b9ea9cbad4e7@mail.gmail.com> On Wed, Dec 31, 2008 at 3:41 AM, Lenard Lindstrom wrote: > David Cournapeau wrote: >> The thing is since I don't understand what happens in the print '%f' >> case, I don't know how to clean it up, if it is at all possible. But in >> anyway, it means that with my changes, we are not worse than python >> itself, and I think we are better than before, >> >> > Just a quick look in SVN, trunk/Objects/stringobject.c, shows that the > call path for a "%f" format is string_mod -> PyString_Format -> > formatfloat -> PyOS_ascii_formatd. Thanks, I did not think about looking into stringobject. I now have to understand why it does print differently, as going through format_float should avoid the inconsistencies cheers, David > -- > Lenard Lindstrom > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Tue Dec 30 23:11:02 2008 From: cournape at gmail.com (David Cournapeau) Date: Wed, 31 Dec 2008 13:11:02 +0900 Subject: [Numpy-discussion] formatting issues, locale and co In-Reply-To: References: <49570E2B.2000608@ar.media.kyoto-u.ac.jp> <3d375d730812272246l78ac1e45u207370fd9d0ac765@mail.gmail.com> <49571F6F.4070102@ar.media.kyoto-u.ac.jp> <5b8d13220812282038n626c44aep89756d7f0fbe7930@mail.gmail.com> Message-ID: <5b8d13220812302011i5cfb76acma1527bdcbf7a83ee@mail.gmail.com> On Wed, Dec 31, 2008 at 11:28 AM, Pauli Virtanen wrote: > Mon, 29 Dec 2008 13:38:12 +0900, David Cournapeau wrote: > [clip] >> I put my yesterday work in the fix_float_format branch: >> - it fixes the locale issue >> - it fixes the long double issue on windows. >> - it also fixes some tests (we were not testing single precision >> formatting but twice double precision instead >> - the single precision test fails on the trunk BTW). >> - it handles inf and nan more consistently across platforms (e.g. >> str(np.log(0)) will be '-inf' on all platforms; on windows, it used >> to be '-1.#INF' >> - I was afraid it would broke converting back the >> string to float, but it is broken anyway before my change, e.g. >> float('-1.#INF') does not work on windows). > [clip] > > I did some work on the fix_float_format branch from the opposite > direction, making fromfile and fromstring properly locale-independent. > (cf. #884) > > Works currently on POSIX systems, but some tests fail on Windows because > float('inf') does not work [neither does float('-1.#INF')...]. (cf. #510) > A bit more work must be done on NumPyOS_ascii_strtod to make inf/nan work > as intended. Also, roundtrip tests for repr would be nice to add, if they > aren't there yet, and possibly for str <-> fromstring roundtrip, too. > I'll be almost offline for 1.5 weeks starting now, so if you want to > finish this, go ahead. Thank you for working on this, Pauli. The problem on windows may not be specific to windows: the difference really is whether the formatting is done by python or the C runtime. It just happens that on Linux and Mac OS X, the strings are the same - but it could be different on other OS. I have not looked into C99, whether this is standardized or not (the size of exponent is, but I don't know about nan and inf). We should also change pretty print of arrays, I think - although it is a change and may break things. Since that's how python represents the numbers, I guess we will have to change at some point. David From alan.mcintyre at gmail.com Wed Dec 31 05:53:27 2008 From: alan.mcintyre at gmail.com (Alan McIntyre) Date: Wed, 31 Dec 2008 02:53:27 -0800 Subject: [Numpy-discussion] Removal of deprecated test framework stuff Message-ID: <1d36917a0812310253t4d03b365u874a9634eb5d2e30@mail.gmail.com> Hi all, Unless somebody objects, I'd like to remove from NumPy 1.3 the following numpy.testing items that were deprecated in NumPy 1.2 (since the warnings promise we'll do so ;): - ParametricTestCase (also removing the entire file numpy/testing/parametric.py) - The following arguments from numpy.testing.Tester.test() (which is used for module test functions): level, verbosity, all, sys_argv, testcase_pattern - Path manipulation functions: set_package_path, set_local_path, restore_path - NumpyTestCase, NumpyTest Thanks, Alan From millman at berkeley.edu Wed Dec 31 07:49:31 2008 From: millman at berkeley.edu (Jarrod Millman) Date: Wed, 31 Dec 2008 04:49:31 -0800 Subject: [Numpy-discussion] Removal of deprecated test framework stuff In-Reply-To: <1d36917a0812310253t4d03b365u874a9634eb5d2e30@mail.gmail.com> References: <1d36917a0812310253t4d03b365u874a9634eb5d2e30@mail.gmail.com> Message-ID: On Wed, Dec 31, 2008 at 2:53 AM, Alan McIntyre wrote: > Unless somebody objects, I'd like to remove from NumPy 1.3 the > following numpy.testing items that were deprecated in NumPy 1.2 (since > the warnings promise we'll do so ;): > > - ParametricTestCase (also removing the entire file numpy/testing/parametric.py) > - The following arguments from numpy.testing.Tester.test() (which is > used for module test functions): level, verbosity, all, sys_argv, > testcase_pattern > - Path manipulation functions: set_package_path, set_local_path, restore_path > - NumpyTestCase, NumpyTest Thanks for taking care of this. Jarrod From nwagner at iam.uni-stuttgart.de Wed Dec 31 10:06:24 2008 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 31 Dec 2008 16:06:24 +0100 Subject: [Numpy-discussion] numpy.test() failures Message-ID: >>> numpy.__version__ '1.3.0.dev6283' ====================================================================== FAIL: Check formatting. ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib64/python2.5/site-packages/nose-0.10.4-py2.5.egg/nose/case.py", line 182, in runTest self.test(*self.arg) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 74, in check_complex_type err_msg='Failed str formatting for type %s' % tp) File "/usr/local/lib64/python2.5/site-packages/numpy/testing/utils.py", line 183, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: Failed str formatting for type ACTUAL: '(1e+10+0j)' DESIRED: '1e+10' ====================================================================== FAIL: Check formatting when using print ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib64/python2.5/site-packages/nose-0.10.4-py2.5.egg/nose/case.py", line 182, in runTest self.test(*self.arg) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 108, in check_float_type_print _test_redirected_print(float(x), tp) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 104, in _test_redirected_print err_msg='print failed for type%s' % tp) File "/usr/local/lib64/python2.5/site-packages/numpy/testing/utils.py", line 183, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: print failed for type ACTUAL: '10000000000.0\n' DESIRED: '1e+10\n' ====================================================================== FAIL: Check formatting when using print ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib64/python2.5/site-packages/nose-0.10.4-py2.5.egg/nose/case.py", line 182, in runTest self.test(*self.arg) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 115, in check_complex_type_print _test_redirected_print(complex(x), tp) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 104, in _test_redirected_print err_msg='print failed for type%s' % tp) File "/usr/local/lib64/python2.5/site-packages/numpy/testing/utils.py", line 183, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: print failed for type ACTUAL: '(10000000000+0j)\n' DESIRED: '(1e+10+0j)\n' ====================================================================== FAIL: test_print.test_locale_single ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib64/python2.5/site-packages/nose-0.10.4-py2.5.egg/nose/case.py", line 182, in runTest self.test(*self.arg) File "/usr/local/lib64/python2.5/site-packages/numpy/testing/decorators.py", line 82, in skipper return f(*args, **kwargs) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 164, in test_locale_single return _test_locale_independance(np.float32) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 157, in _test_locale_independance err_msg='Failed locale test for type %s' % tp) File "/usr/local/lib64/python2.5/site-packages/numpy/testing/utils.py", line 183, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: Failed locale test for type ACTUAL: '1,2' DESIRED: '1.2' ====================================================================== FAIL: test_print.test_locale_double ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib64/python2.5/site-packages/nose-0.10.4-py2.5.egg/nose/case.py", line 182, in runTest self.test(*self.arg) File "/usr/local/lib64/python2.5/site-packages/numpy/testing/decorators.py", line 82, in skipper return f(*args, **kwargs) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 169, in test_locale_double return _test_locale_independance(np.double) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 157, in _test_locale_independance err_msg='Failed locale test for type %s' % tp) File "/usr/local/lib64/python2.5/site-packages/numpy/testing/utils.py", line 183, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: Failed locale test for type ACTUAL: '1,2' DESIRED: '1.2' ====================================================================== FAIL: test_print.test_locale_longdouble ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib64/python2.5/site-packages/nose-0.10.4-py2.5.egg/nose/case.py", line 182, in runTest self.test(*self.arg) File "/usr/local/lib64/python2.5/site-packages/numpy/testing/decorators.py", line 82, in skipper return f(*args, **kwargs) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 174, in test_locale_longdouble return _test_locale_independance(np.longdouble) File "/usr/local/lib64/python2.5/site-packages/numpy/core/tests/test_print.py", line 157, in _test_locale_independance err_msg='Failed locale test for type %s' % tp) File "/usr/local/lib64/python2.5/site-packages/numpy/testing/utils.py", line 183, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: Failed locale test for type ACTUAL: '1,2' DESIRED: '1.2' ---------------------------------------------------------------------- Ran 1797 tests in 16.303s FAILED (KNOWNFAIL=1, failures=6)