Hi. I have been a numpy user myself for some time now (though its my first message on this list). I am trying to do a informal survey for a univ related project .. I am a grad student btw working in comp sci. It would be awesome if you guys could respond to some of the following questions : a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy? b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran? c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them? If you feel its not relevant to the list .. feel free to email me personally. I would be very interested in talking about these issues. thanks, rahul
On 16/11/2007, Rahul Garg <rahulgarg44@gmail.com> wrote:
It would be awesome if you guys could respond to some of the following questions : a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy? b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran? c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
If you feel its not relevant to the list .. feel free to email me personally. I would be very interested in talking about these issues.
I think it would be interesting and on-topic to hear a few words from people to see what they do with numpy. a) I use python/numpy/scipy to work with astronomical observations of pulsars. This includes a range of tasks including: simple scripting to manage jobs on our computation cluster; minor calculations (like a better scientific calculator, though frink is sometimes better because it keeps track of units); gathering and plotting results; prototyping search algorithms and evaluating their statistical properties; providing tools for manipulating photon data; various other tasks. We also use the software package PRESTO for much of the heavy lifting; much of it is written in python. b) I have projects for which python is too slow, yes. Pulsar surveys are extremely compute-intensive (I estimated one that I'm involved with at two or three mega-core-hours), so any software that is going in the pipeline should have its programming-time/runtime tradeoff carefully examined. I wrote a search code in C, with bits in embedded assembler. All the driver code to shuffle files around and supply correct parameters is in python, though. PRESTO has a similar pattern, with more C code because it does more of the hard work. In most cases the communication between the heavy-lifting code and the python code is through the UNIX environment (the heavy-lifting code gets run as a separate executable) but PRESTO makes many functions available in python modules. On the other hand, I often write quick Monte Carlo simulations that run for a day or more, but since writing them takes about as long as running them, it's not worth writing them in a language that would run faster. c) Our problems tend to be embarrassingly parallel, so we tend not to use clever parallelism toolkits. For the survey I am working on, I wrote one (python) script to process a single beam on a single node (which takes about thirty hours), and another to keep the batch queue filled with an appropriate number of such jobs. I have thought about writing a more generic tool for managing this kind of job queue, but haven't invested the time yet. Anne
On Sat, Nov 17, 2007 at 02:07:34AM -0500, Anne Archibald wrote:
On 16/11/2007, Rahul Garg <rahulgarg44@gmail.com> wrote:
It would be awesome if you guys could respond to some of the following questions : a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy? b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran? c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
If you feel its not relevant to the list .. feel free to email me personally. I would be very interested in talking about these issues.
I think it would be interesting and on-topic to hear a few words from people to see what they do with numpy.
a) I use python/numpy/scipy to work with astronomical observations of pulsars. This includes a range of tasks including: simple scripting to manage jobs on our computation cluster; minor calculations (like a better scientific calculator, though frink is sometimes better because it keeps track of units);
So does 'ipython -p physics': In [1]: x = 3 m/s^2 In [2]: y = 15 s In [3]: x*y Out[3]: 45 m/s Regards Stéfan
Hi Rahul,
a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
I'm a grad student "doing" computational biology. I primarily use the NumPy/SciPy/matplotlib triumvirate as a post processing tool to analyze what the heck happened after we run some learning algorithms we develop (or canned ones, like libsvm (for example)) to look for some sense in the results. I've been working w/ analyzing interaction networks/graphs, so I also use NetworkX[1] quite a bit as well (it's also a nice package w/ responsive authors). Many of the folks (in my lab, and collaborators) like to use MATLAB, so I've found scipy's io.loadmat invaluable for making this a bit more seamless. So, in general, for me (so far) numpy/scipy are generally used to integrate various datasets together and see if things "look kosher" (before runs and after runs).
b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran?
Yes, for things like boosting, svm, graph mining, etc ... but that's no real surprise since their iterative and need to run on large datasets. You should also note that there are python interfaces to these things out there as well, but I (thus far) haven't taken much of advantage of those and usually pipe out data into the expected text input formats and pull them back in when the algo is done.
c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
I'd really like to (not just for Python), but I haven't. -steve [1] NetworkX: https://networkx.lanl.gov/wiki
hi. thanks for ur responses .. so it looks like python/numpy is used more for gluing things together or doing things like postprocessing. is anyone using it for core calculations .. as in long running python calculations? i used numpy myself for some nonlinear dynamics and chaos related calculations but they were usually very short running only for a few seconds at a time. thanks, rahul On Nov 17, 2007 8:28 AM, Steve Lianoglou <lists.steve@arachnedesign.net> wrote:
Hi Rahul,
a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
I'm a grad student "doing" computational biology. I primarily use the NumPy/SciPy/matplotlib triumvirate as a post processing tool to analyze what the heck happened after we run some learning algorithms we develop (or canned ones, like libsvm (for example)) to look for some sense in the results.
I've been working w/ analyzing interaction networks/graphs, so I also use NetworkX[1] quite a bit as well (it's also a nice package w/ responsive authors).
Many of the folks (in my lab, and collaborators) like to use MATLAB, so I've found scipy's io.loadmat invaluable for making this a bit more seamless.
So, in general, for me (so far) numpy/scipy are generally used to integrate various datasets together and see if things "look kosher" (before runs and after runs).
b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran?
Yes, for things like boosting, svm, graph mining, etc ... but that's no real surprise since their iterative and need to run on large datasets.
You should also note that there are python interfaces to these things out there as well, but I (thus far) haven't taken much of advantage of those and usually pipe out data into the expected text input formats and pull them back in when the algo is done.
c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
I'd really like to (not just for Python), but I haven't.
-steve
[1] NetworkX: https://networkx.lanl.gov/wiki
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Nov 19, 2007 4:05 PM, Rahul Garg <rahulgarg44@gmail.com> wrote:
hi.
thanks for ur responses .. so it looks like python/numpy is used more for gluing things together or doing things like postprocessing. is anyone using it for core calculations .. as in long running python calculations? i used numpy myself for some nonlinear dynamics and chaos related calculations but they were usually very short running only for a few seconds at a time.
We use it for the core of long-running computations: http://dx.doi.org/10.1016/j.acha.2007.08.001 The innermost loops use numpy arrays, with some hand-coded C to do optimized dot-like operations without any error or type checking. The entire code is an amalgam of numpy, scipy, matplotlib, mayavi and pyx, with pyrex, some in-house Fortran wrapped with f2py, hand-coded C, and a bit of auto-generated C++ loaded at runtime with weave.inline(). While the times listed above are fairly short (small electrostatics examples), we are using this inside long-running quantum mechanics codes (papers in review now). Cheers, f
On Nov 19, 2007 5:19 PM, Fernando Perez <fperez.net@gmail.com> wrote:
On Nov 19, 2007 4:05 PM, Rahul Garg <rahulgarg44@gmail.com> wrote:
hi.
thanks for ur responses .. so it looks like python/numpy is used more for gluing things together or doing things like postprocessing. is anyone using it for core calculations .. as in long running python calculations? i used numpy myself for some nonlinear dynamics and chaos related calculations but they were usually very short running only for a few seconds at a time.
We use it for the core of long-running computations:
Heh, *On subgroups of the Monster containing A5's* turns up as a related paper. You folks doing some fancy math there? Chuck
On Nov 20, 2007 1:48 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Nov 19, 2007 5:19 PM, Fernando Perez <fperez.net@gmail.com> wrote:
We use it for the core of long-running computations:
Heh, On subgroups of the Monster containing A5's turns up as a related paper. You folks doing some fancy math there?
Mmh, I'm not really sure why the Monster group paper turns up as 'related'. That kind of work sits squarely in the abstract algebra/group theory world, while what we do is far more applied. Think of it as a wavelet-inspired way of decomposing long-range potentials into near and far-field components to achieve better efficiency, along with tricks inspired from trapezoid rule integration to reduce the cost in higher dimensions. The first paper in the 'related' list, "Multiresolution separated representations of singular and weakly singular operators" is actually one that is really related, and has much of the theoretical background behind our work. It's actually nothing fancy at all from a theoretical standpoint, though rather complicated to implement (especially because speed comes from very aggressive truncations, so error analysis is difficult, critical and very easy to get wrong: you have to walk a fine line between speed and wrong results :). Cheers, f
On Nov 19, 2007 4:05 PM, Rahul Garg <rahulgarg44@gmail.com> wrote:
hi.
thanks for ur responses .. so it looks like python/numpy is used more for gluing things together or doing things like postprocessing. is anyone using it for core calculations .. as in long running python calculations? i used numpy myself for some nonlinear dynamics and chaos related calculations but they were usually very short running only for a few seconds at a time.
It depends on what you mean by long running. A couple of hours seems long to me, others might think in terms of days and weeks. The longest running things I have done are Monte Carlo simulations and a genetic algorithm to find the best integer coefficients for a complex equal ripple filter starting with the floating point coefficients produced by a complex Remez algorithm. I have also used python and mixed numpy/C++ to program an image server to produce photometrically correct, blurred, widefield streaked star images using an indexed Tycho catalog. I mostly use python for quick and dirty programs, prototyping, and as an interface to C++ when speed is essential. Chuck
--- Rahul Garg <rahulgarg44@gmail.com> wrote:
hi.
thanks for ur responses .. so it looks like python/numpy is used more for gluing things together or doing things like postprocessing. is anyone using it for core calculations .. as in long running python calculations? i used numpy myself for some nonlinear dynamics and chaos related calculations but they were usually very short running only for a few seconds at a time. thanks, rahul
I've used Python a little to solve ODEs for chaotic systems. More for time series analysis (attractor reconstruction and associated data analysis problems). These ran rather fast on the order of seconds or minutes. Lately, I've been coding up a package to solved Schrodinger's Equation for 2D arbitrarily shaped, infinite wall potentials. I've settled on a Boundary Element Approach to get the eigenfunctions in these systems. The goal is to study phenomena associated with quantum chaos in the semiclassical regime. These calculations tend to run on the order of 10s of minutes to an hour. I eventually will be writing C extensions for the slower running functions (bottle necks). I've done that before and it's a big help for speed. -- Lou Pecora, my views are my own. ____________________________________________________________________________________ Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ
On Nov 20, 2007 7:33 AM, Lou Pecora <lou_boog2000@yahoo.com> wrote:
Lately, I've been coding up a package to solved Schrodinger's Equation for 2D arbitrarily shaped, infinite wall potentials. I've settled on a Boundary Element Approach to get the eigenfunctions in these systems. The goal is to study phenomena associated with quantum chaos in the semiclassical regime. These calculations tend to run on the order of 10s of minutes to an hour. I eventually will be writing C extensions for the slower running functions (bottle necks). I've done that before and it's a big help for speed.
Very nice. Any plans to release that code? Brian
-- Lou Pecora, my views are my own.
____________________________________________________________________________________ Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Brian Granger wrote:
On Nov 20, 2007 7:33 AM, Lou Pecora <lou_boog2000@yahoo.com> wrote:
Lately, I've been coding up a package to solved Schrodinger's Equation for 2D arbitrarily shaped, infinite wall potentials. I've settled on a Boundary Element Approach to get the eigenfunctions in these systems. The goal is to study phenomena associated with quantum chaos in the semiclassical regime. These calculations tend to run on the order of 10s of minutes to an hour. I eventually will be writing C extensions for the slower running functions (bottle necks). I've done that before and it's a big help for speed.
Very nice. Any plans to release that code?
Brian
No, not now. I'm just trying to get it up and running. Getting it ready for any release is way down the road. But if/when that happens, I would consider releasing the code. -- Cheers, Lou Pecora Code 6362 Naval Research Lab Washington, DC 20375, USA Ph: +202-767-6002 email: pecora@anvil.nrl.navy.mil
a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
Manifold learning, and thus unconstrianed optimizations b) Have you ever felt that numpy/scipy was slow and had to switch to
C/C++/Fortran?
Not if it is done correctly (I just kept some piece of my old C++ code beacsue I didn't want to rewrite it) c) Do you use any form of parallel processing? Multicores? SMPs?
Clusters? If yes how did u utilize them?
Not for the moment, but I'm thinking about how to use it easily (free lunch) Matthieu -- French PhD student Website : http://miles.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
a) Can you guys tell me briefly about the kind of problems you are
tackling with numpy and scipy?
Electromagnetic problems: eigenvalues finding, linear systems, optimizations...
b) Have you ever felt that numpy/scipy was slow and had to switch to
C/C++/Fortran?
I come from that world, and python gives me the programming speed I needed with a more-than-reasonable executing speed. I mainly use python/numpy/scipy as a better Matlab, which is platform independent and free, i.e. an interface to numerical libraries like blas, lapack, fftw, umfpack, arpack and so on (which are mainly written in Fortran/C).
c) Do you use any form of parallel processing? Multicores? SMPs?
Clusters? If yes how did u utilize them?
I use MPI (mpi4py) on a shared memory multiprocessor SGI Altix.
Lorenzo.
Rahul Garg wrote:
a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
mainly timeseries of Remote Sensing data ('satellite images') processing. No really fancy math, but huge (sometimes multiple gigabytes) multidimensional (date, bands, y, x: order of magnitude: [hundreds, some, tenthousands or more, idem]) datasets. Some tasks are long-running, not because of the complex math involved, but because of the amount of data.
b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran?
Yes, but usually the overhead of programming the whole thing in C is not worth the speedup of processing, especially with modern fast computers and easy ways to parallelize stuff (ipython and/or parallelpython).
c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
See above. Just started to use parallelpython here, currently not yet for production work, but in testing/prototyping phase. Using both smp, multicore and multiple machines ('cluster'). PP doesn't make any difference between them. How? Just cut the data in suitable pieces and throw them as a pp job to the cluster :-) It's really that simple nowadays. And most of our processing is very parallel in nature. Cheers, Vincent Schut.
Rahul Garg wrote:
It would be awesome if you guys could respond to some of the following questions : a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
I am using both numpy and scipy to solve PDEs in the context of finite element method (elasticity, porous media, ...).
b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran?
I use bits of C code wrapped by SWIG either to address real bottle-necks (FE assembling, element matrix computations) or due to the fact that I have lots of "legacy" C code that I reuse from my previous projects.
c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
If I ever get myself to start, I will use petsc4py.
If you feel its not relevant to the list .. feel free to email me personally. I would be very interested in talking about these issues.
It is really interesting to see what others do. r.
On Saturday 17 November 2007 03:50, Rahul Garg wrote:
a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
Organizing jobs in computational chemistry, and parsing/analyzing the output. To some extent, also some actual calculations and visualization.
b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran?
Almost always when ab initio calculations are concerned.
c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
Not with Python. Cheers, Karol -- written by Karol Langner Wed Nov 21 10:25:22 CET 2007
a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
I'm using python with numpy,scipy, pytables and matplotlib for data analysis in the field of high energy particle physics. Most of the work is histograming millions of events, fitting functions to the distributions or applying cuts to yield optimized signal/background ratios. I often use the random number and optimization facilities for these purposes. Most of my colleagues use ROOT (root.cern.ch) which has also a python binding, however, I love the simplicity of numpy's ufuncs and indexing capabilities, which makes the code much denser and readable. Another part is developing toy simulation and reconstruction algorithms in python which later will be translated to C++ since the main software framework in our collaboration is written in C++. This includes log-likelihood track reconstruction algorithms based on the arrival times of photons measured with photomultipliers. Simulation involves particle generators and detector response simulations to test the reconstruction with known inputs.
b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran?
In particular for the simulation yes, depending on the level of detail of course. But only parts, eg. random number generation for certain distributions had to be coded in C/C++. Since the main software for my work is coded in C++, I often end up writing wrappers around parts of this software to extract the data I need for doing the analysis work in python.
c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
We have a cluster at our lab which I use for my computations. This is not very difficult since the data can be split into several files and each can be treated in the same way. One just needs to pass a script over and over again to the cluster, this is done in a shell script or with the tools provided by the cluster scheduling system. Cheers! Bernhard
On 17.11.2007, at 03:50, Rahul Garg wrote:
It would be awesome if you guys could respond to some of the following questions : a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
For me, NumPy is an important building block in a set of computational Python libraries that form the basis of my daily work in molecular simulations. The main building block is my Molecular Modelling Toolkit (http://dirac.cnrs-orleans.fr/MMTK/), but NumPy provides the basic data structure (arrays) for much of what MMTK does, and even more so for interfacing with the rest of the world. I don't use SciPy at all.
b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran?
There is some C code (and an increasing amount of Pyrex code) in my Python environment, and it is essential for good performance. However, in terms of code quantity, it is negligible.
c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
All of them. I use threading (multicores and SMPs) in MMTK, and coarse-grained parallelization as implemented in ScientificPython for analyzing large data sets. ScientificPython has two parallel computing modules. The easiest to use implements a master-slave model in which a master process delegates computational tasks to an arbitrary (and possibly varying) number of slave processes: http://dirac.cnrs-orleans.fr/hg/ScientificPython/main/file/ 73cc270217fc/Examples/master_slave_demo.py The other parallelization package is based on the BSP model of parallel computing: http://dirac.cnrs-orleans.fr/hg/ScientificPython/main/file/ 73cc270217fc/Examples/BSP/ http://dirac.cnrs-orleans.fr/ScientificPython/ScientificPythonManual/ It probably has a steeper learning curve, but it is suitable for more complex parallel programs because it permits the construction of parallel libraries. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Centre de Biophysique Moléculaire, CNRS Orléans Synchrotron Soleil - Division Expériences Saint Aubin - BP 48 91192 Gif sur Yvette Cedex, France Tel. +33-1 69 35 97 15 E-Mail: hinsen@cnrs-orleans.fr ---------------------------------------------------------------------
On Fri, Nov 16, 2007 at 07:50:07PM -0700, Rahul Garg wrote:
It would be awesome if you guys could respond to some of the following questions :
OK, I'll take a bit of time to do this.
a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
My day-to-day use is data processing of an atomic physics experiment (Bose Einstein condensation). The data created by the experiment is retrived from a bunch of instruments (scopes, cameras, and sensors) and stored in an HDF5 file for each run, along with the parameters of the run. This is done in Matlab for historical reasons, and MatLab is an absolute pain for this. I have implemented a similar system in Python for another experiment (I wrote an article sumerizing the lessons learnt and the patterns to follow, hell this was the fourth experiment control system I built, each time using a different platform www.gael-varoquaux.info/computers/agile_computer_control_of_an_experiment.pdf ). The data is first stored on a windows box that runs the experiment, then it is downloaded by our Linux server that keeps a hash table of all the experiments ran and some figures of merit. The data processing is done on the Linux box (quite often connected remotely using Nomachine's NX). The data is accessed in a database-like way through a custom Python module (1000 LOC). Recently I have switched to a object-relationnal-mapping-like loading of the HDF5-files (hierarchical data trees) through a database-like interface. It rocks, IMHO. We do simple fitting, blurring, filter, averaging, fft, ... on the data, and plot the results. Another of my uses is for simple toy models. Recently I have been elaborating a statistical model of an experiment, so I was running Monte Carlo simulations, but I have ran dynamical model integrations, and linear algebra calculations always revolving around the physics of cold atomic gases.
b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran?
Never. However I really like the fact that I can if I need to. When controling instruments in Python I often have to use their C SDK. In which case I use Pyrex or Ctypes.
c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
No. Most often I am limited by IO. I have the feeling that unlike most people on the list, my main work is not on a computer, but on an experiment. Gaël
Rahul Garg wrote:
a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy?
Reduction of large N-body simulations of astrophysical gravitational systems (N up to 268 millions). See http://aramis.obspm.fr/~revaz/pNbody/.
b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran?
Some specific functions are missing in numpy and a pure python implementation is too slow. In that case, I implement the specific function in C, as a python module.
c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them?
Yes, my toolbox is now nearly completely parallelized with mpi4py. Some parallelization parts are also implemented in C modules. It works fine on beowulf clusters, multicores or smps. -- (o o) --------------------------------------------oOO--(_)--OOo------- Yves Revaz Lerma Batiment A Tel : ++ 33 (0) 1 40 51 20 79 Observatoire de Paris Fax : ++ 33 (0) 1 40 51 20 02 77 av Denfert-Rochereau e-mail : yves.revaz@obspm.fr F-75014 Paris Web : http://obswww.unige.ch/~revaz/ FRANCE ----------------------------------------------------------------
participants (18)
-
Anne Archibald
-
bernhard.voigt@gmail.com
-
Brian Granger
-
Charles R Harris
-
Fernando Perez
-
Gael Varoquaux
-
Karol Langner
-
Konrad Hinsen
-
lorenzo bolla
-
Lou Pecora
-
Lou Pecora
-
Matthieu Brucher
-
Rahul Garg
-
Robert Cimrman
-
Stefan van der Walt
-
Steve Lianoglou
-
Vincent Schut
-
Yves Revaz