Mailman 3 March 2005 - NumPy-Discussion

Notes from meeting with Guido regarding inclusion of array package in Python core
by Perry Greenfield March 10, 2005

March 10, 2005

On March 7th Travis Oliphant and Perry Greenfield met Guido and Paul Dubois to discuss some issues regarding the inclusion of an array package within core Python. The following represents thoughts and conclusions regarding our meeting with Guido. They in no way represent the order of discussion with Guido and some of the points we raise weren't actually mentioned during the meeting, but instead were spurred by subsequent discussion after the meeting with Guido. 1) Including an array … [View More]package in the Python core. To start, before the meeting we both agreed that we did not think that this itself was a high priority in itself. Rather we both felt that the most important issue was making arrays an acceptable and widely supported interchange format (it may not be apparent to some that this does not require arrays be in the core; more on that later). In discussing the desirability of including arrays in the core with Guido, we quickly came to the conclusion that not only was it not important, that in the near term (the next couple years and possibly much longer) it was a bad thing to do so. This is primarily because it would mean that updates to the array package would wait on Python releases potentially delaying important bug fixes, performance enhancements, or new capabilities greatly. Neither of us envisions any scenario regarding array packages, whether that be Numeric3 or numarray, where we would consider it to be something that would not *greatly* benefit from decoupling its release needs from that of Python (it's also true that it possibly introduces complications for Python releases if they need to synch with array schedules, but being inconsiderate louts, we don't care much about that). And when one considers that the move to multicore and 64-bit processors will introduce the need for significant changes in the internals to take advantage of these capabilities, it is unlike we will see a quiescent, maintenance-level state for an array package for some time. In short, this issue is a distraction at the moment and will only sap energy from what needs to be done to unify the array packages. So what about supporting arrays as an interchange format? There are a number of possibilities to consider, none of which require inclusion of arrays into the core. It is possible for 3rd party extensions to optionally support arrays as an interchange format through one of the following mechanisms: a) So long as the extension package has access to the necessary array include files, it can build the extension to use the arrays as a format without actually having the array package installed. The include files alone could be included into the core (Guido has previously been receptive to doing this though at this meeting he didn't seem quite as receptive instead suggesting the next option) or could be packaged with extension (we would prefer the former to reduce the possibilities of many copies of include files). The extension could then be successfully compiled without actually having the array package present. The extension would, when requested to use arrays would see if it could import the array package, if not, then all use of arrays would result in exceptions. The advantage of this approach is that it does not require that arrays be installed before the extension is built for arrays to supported. It could be built, and then later the array package could be installed and no rebuilding would be necessary. b) One could modify the extension build process to see if the package is installed and the include files are available, if so, it is built with the support, otherwise not. The advantage of this approach is that it doesn't require the include files be included with the core or be bundled with the extension, thus avoiding any potential version mismatches. The disadvantage is that later adding the array package require the extension to be rebuilt, and it results in more complex build process (more things to go wrong). c) One could provide the support at the Python level by instead relying on the use of buffer objects by the extension at the C level, thus avoiding any dependence on the array C api. So long as the extension has the ability to return buffer objects containing the putative array data to the Python level and the necessary meta information (in this case, the shape, type, and other info, e.g., byteswapping, necessary to properly interpret the array) to Python, the extension can provide its own functions or methods to convert these buffer objects into arrays without copying of the data in the buffer object. The extension can try to import the array package, and if it is present, provide arrays as a data format using this scheme. In many respects this is the most attractive approach. It has no dependencies on include files, build order, etc. This approach led to the suggestion that Python develop a buffer object that could contain meta information, and a way of supporting community conventions (e.g., a name attribute indicating which conventions was being used) to facilitate the interchange of any sort of binary data, not just arrays. We also concluded that it would be nice to be able create buffer objects from Python with malloced memory (currently one can only create buffer objects from other objects that already have memory allocated; there is no way of creating newly allocated, writable memory from Python within a buffer object; one can create a buffer object from a string, but it is not writable). Nevertheless, if an extension is written in C, none of these changes are necessary to make use of this mechanism for interchange purposes now. This is the approach we recommend trying. The obvious case to apply it to is PIL as test case. We should do this ourselves and offer it as a patch to PIL. Other obvious cases are to support image interchange for GUIs (e.g., wxPython) and OpenGL. 2) Scalar support, rank-0 and related. Travis and I agreed (we certainly seek comments on this conclusion; we may have forgotten about key arguments arguing for one the different approaches) that the desirability of using rank-0 arrays as return values from single element indexing depends on other factors, most importantly Python's support for scalars in various aspects. This is a multifaceted issue that will need to be determined by considering all the facets simultaneously. The following tries to list the pro's and con's previously discussed for returning scalars (two cases previously discussed) or rank-0 arrays (input welcomed). a) return only existing Python scalar types (cast upwards except for long long and long double based types) Pros: - What users probably expect (except matlab users!) - No performance hit in subsequent scalar expressions - faster indexing performance (?) Cons: - Doesn't support array attributes, numeric behaviors - What do you return for long long and long double? No matter what is done, you will either lose precision or lose consistency. Or you create a few new Python scalar types for the unrepresentable types? But, with subclassing in C the effort to create a few scalar types is very close to the effort to create many. b) create new Python scalar types and return those (one for each basic array type) Pros: - Exactly what numeric users expect in representation - No peformance hit in subsequent scalar expressions - faster indexing performance - Scalars have the same methods and attributes as arrays Cons: - Might require great political energy to eventually get the arraytype with all of its scalartype-children into the Python core. This is really an unknown, though, since if the arrayobject is in the standard module and not in the types module, then people may not care (a new type is essentially a new-style class and there are many, many classes in the Python standard library). A good scientific-packaging solution that decreases the desireability of putting the arrayobject into the core would help alleviate this problem as well. - By itself it doesn't address different numeric behaviors for the "still-present" Python scalars throughout Python. c) return rank-0 array Pros: - supports all array behaviors, particularly with regard to numerical processing, particularly with regard to ieee exception handling (a matter of some controversy, some would like it also to be len()=1 and support [0] index, which strictly speaking rank-0 arrays should not support) Cons: - Performance hit on all scalar operations (e.g., if one then does many loops over what appears to be a pure scalar expression, use of rank-0 will be much slower than Python scalars since use of arrays incurs significant overhead. - Doesn't eliminate the fact that one can still run into different numerical behavior involving operations between Python scalars. - Still necessary to write code that must deal with Python scalars "leaking" into code as inputs to functions. - Can't currently be used to index sequences (so not completely usable in place of scalars) Out of this came two potential needs (The first isn't strictly necessary if approach a is taken, but could help smooth use of all integer types as indexes if approach b is taken): If rank-0 arrays are returned, then Guido was very receptive to supporting a special method, __index__ which would allow any Python object to be used as an index to a sequence or mapping object. Calling this would return a value that would be suitable as index if the object was not itself suitable directly. Thus rank-0 arrays would have this method called to convert its internal integer value into a Python integer. There are some details about how this would work at the C level that need to be worked out. This would allow rank-0 integer arrays to be used as indices. To be useful, it would be necessary to get this into the core as quickly as possible (if there are C API issues that have lingering solutions that won't be solved right away, then a greatly delayed implementation in Python would make this less than useful). We talked at some length about whether it was possible to change Python's numeric behavior for scalars, namely support for configurable handling of numeric exceptions in the way numarray does it (and Numeric3 as well). In short, not much was resolved. Guido didn't much like the stack approach to the exception handling mode. His argument (a reasonable one) was that even if the stack allowed pushing and popping modes, it was fragile for two reasons. If one called other functions in other modules that were previously written without knowledge that the mode could be changed, those functions presumed the previous behavior and thus could be broken with mode change (though we suppose that just puts the burden on the caller to guard all external calls with restores to default behavior; even so, many won't do that leading to spurious bug reports that may annoy maintainers to no end though no fault of their own). He also felt that some termination conditions may cause missed pops leading to incorrect modes. He suggested studying the use of the decimal's use of context to see if it could used as a model. Overall he seemed to think that setting mode on a module basis was a better approach. Travis and I wondered about how that could be implemented (it seems to imply that the exception handling needs to know what module or namespace is being executed in order to determine the mode. So some more thought is needed regarding this. The difficulty of proposing such changes and getting them accepted is likely to be considerable. But Travis had a brilliant idea (some may see this as evil but I think it has great merit). Nothing prevents a C extension from hijacking the existing Python scalar objects behaviors. Once a reference is obtained to an integer, float or complex value, one can replace the table of operations on those objects with whatever code one wishes. In this way an array package could (optionally) change the behavior of Python scalars. In this way we could test the behavior of proposed changes quite easily, distribute that behavior quite easily in the community, and ultimately see if there are really any problems without expending any political energy to get it accepted. Once seeing if it really worked (without "forking" Python either), would place us in a much stronger position to have the new behaviors incorporated into the core. Even then, it may never prove necessary if can be so customized by the array package. This holds out the potential of making scalar/array behavior much more consistent. Doing this may allow option a) as the ultimate solution, i.e., no changes needed to Python at all (as such), no rank-0 arrays. This will be studied further. One possible issue is that adding the necessary machinery to make numeric scalar processing consistent with that of the array package may introduce significant performance penalties (what is negligible overhead for arrays may not be for scalars). One last comment is that it is unlikely that any choice in this area prevents the need for added helper functions to the array package to assist in writing code that works well with scalars and arrays. There are likely a number of such issues. A common approach is to wrap all unknown objects with "asarray". This works reasonably well but doesn't handle the following case: If you wish to write a function that will accept arrays or scalars, in principal it would be nice to return scalars if all that was supplied were scalars. So functions to help determine what the output type should be based on the inputs would be helpful, for example to distinguish from when someone provided a rank-0 array as an input (or rank-1 len-1 array) and an actual scalar if asarray happens to map this to the same thing so that the return can properly return a scalar if that is what was originally input. Other such tools may help writing code that allows the main body to treat all objects as arrays without needing checks for scalars. Other miscellaneous comments. The old use of where() may be deprecated and only "nonzero" interpretation will be kept. A new function will be defined to replace the old usage of where (we deem that regular expression search and replaces should work pretty well to make changes in almost all old code). With the use of buffer objects, tostring methods are likely to be deprecated. Python PEPs needed =================== From the discussions it was clear that at least two Python PEPs need to be written and implemented, but that these needed to wait until the unification of the arrayobject takes place. PEP 1: Insertion of an __index__ special method and an as_index slot (perhaps in the as_sequence methods) in the C-level typeobject into Python. PEP 2: Improvements on the buffer object and buffer builtin method so that buffer objects can be Python-tracked wrappers around allocated memory that extension packages can use and share. Two extensions are considered so far. 1) The buffer objects have a meta attribute so that meta information can be passed around in a unified manner and 2) The buffer builtin should take an integer giving the size of writeable buffer object to create. [View Less]

5 6

Matlab is a tool for doing numerical computations with matrices and vectors.
by Colin J. Williams March 10, 2005

March 10, 2005

The subject line is from http://www.math.utah.edu/lab/ms/matlab/matlab.html I've seen little use of the words "matrix" or "vector" in the numarray stuff. These are one or two dimensional structures. The Numeric/numarray focus is on multidimensional arrays. Some years ago, Huaiyu Zhu made a number of postings advocating the introduction of a Python-based system permitting the usual matrix operations. It seems that the culmination of these efforts was Matpy (http://matpy.sourceforge.net/… [View More]

3 5

Re: Notes from meeting with Guido regarding inclusion of array package in Python core
by Joe Harrington March 10, 2005

March 10, 2005

It never rains, but it pours! Thanks for talking with Guido and hammering out these issues and options. You are of course right that the release schedule issue is enough to keep us out of Python core for the time being (and matplotlib out of scipy, according to JDH at SciPy04, for the same reason). However, I think we should still strongly work to put it there eventually. For now, this means keeping it "acceptable", and communicating with Guido often to get his feedback and let him know … [View More]

1 0

Reversing RecArrays
by Francesc Altet March 9, 2005

March 9, 2005

Hi, I would be interested in having a fast way to reverse RecArrays. Regrettably, the most straightforward way to reverse them does not work properly: >>> from numarray import records >>> r = records.array([('Smith', 1234),\ ... ('Johnson', 1001),\ ... ('Williams', 1357),\ ... ('Miller', 2468)], \ ... names='Last_name, phone_number') >>> r[::-1] Traceback (most recent call last): File "<stdin>", … [View More]

2 2

Re: Packaging Scipy (was Future directions for SciPy in light of meeting at Berkeley )
by Brendan Simons March 9, 2005

March 9, 2005

Hear hear. Everytime I see a pathname I groan out loud. Does that mean I'm too wussy to be a programmer? Maybe ;) but there are plenty of potential users (call us the matlab crowd) who feel the same way. I'd much rather just grab a binary installer from a website than manage some giant registry. The appearance of python .mpkg bundles on the mac has been a blessing. -Brendan On 9-Mar-05, at 8:09 PM, numpy-discussion-request(a)lists.sourceforge.net wrote: > Whenever this … [View More]

1 0

comments on array iteration behavior as described in current PEP draft
by Ralf Juengling March 9, 2005

March 9, 2005

>From the current PEP draft: 1-d Iterator A 1-d iterator will be defined that will walk through any array, returning a Python scalar at each step. Order of the iteration is the same for contiguous and discontiguous arrays. The last index always varies the fastest These 1-d iterators can also be indexed and set. In which case the underlying array will be considered 1-d (but does not have to be contiguous in memory). Mapping Iterator ... (2) if &… [View More]

1 0

Numeric and ATLAS
by Cindy Hodgins Burian March 9, 2005

March 9, 2005

Matt Hyclak and Stephen Walton posted about this very problem about a month ago, and I hope they're still reading this forum. I'm having the exact same problem when trying to install Numeric-23.7: gcc -pthread -shared -L/usr/local/lib -I/usr/local/include build/temp.linux-x86_64-2.4/Src/lapack_litemodule.o -L/usr/local/atlas/lib/Linux_HAMMER64SSE2_2 -llapack -lcblas -lf77blas -latlas -lg2c -o build/lib.linux-x86_64-2.4/lapack_lite.so /usr/bin/ld: /usr/local/atlas/lib/… [View More]

2 1

Re: Future directions for SciPy in light of meeting at Berkeley
by Joe Harrington March 9, 2005

March 9, 2005

These were exactly the issues we addressed at SciPy04, and which led to the ASP project. All of the issues brought up in the current discussion have already been discussed there, and with largely the same conclusions. The basic gist is this: THERE ARE THOUSANDS OF PEOPLE WAITING FOR SCIPY TO REACH CRITICAL MASS! SciPy will reach the open-source jumping-off point when an outsider has the following experience: They google, find us, visit us, learn what they'll be getting, install it trivially,… [View More] and read a tutorial that in less than 15 minutes has them plotting their own data. In that process, which will take less than 45 minutes total, they must also gain confidence in the solidity and longevity of the software and find a supportive community. We don't meet all the elements of this test now. Once we do, people will be ready to jump on and work the open-source magic. The goal of ASP (Accessible SciPy) is to meet that test. Some of what we need is being done already, but by a very small number of people. We need everyone's help to reach a meaningful rate of progress. The main points and their status: 1. Resolve the numeric/numarray split and get at least stubs for the basic routines in the Python core. Nothing scares new users more than instability and uncertainty. Travis O. is now attempting to incorporate numarray's added features (including much of the code that implements them) into numeric, and has made a lot of headway. Perry G. has said that he would switch back to numeric if it did the things numarray does. I think we can forsee a resolution to this split in the calendar year IF that effort stays the course. 2. Package it so that it's straightforward to install on all the popular architectures. Joe Cooper has done a lot here, as have others. The basic stuff installs trivially on Red Hat versions of Linux, Windows, and several others (including Debian, I think, and Mac, modulo the inherent problems people report with the Mac package managers, which we can do nothing about). Optimized installs are also available and not all that difficult, particularly if you're willing to issue a one-line command to rebuild a source package. For Linux, it was decided to stick with a core and add-on packages, and to offer umbrella packages that install common groups of packages through the dependency mechanism (e.g., for astronomy or biology). The main issue here is not the packaging, but the documentation, which is trivial to write at this point. I was able to do a "yum install scipy" at SciPy04, once I knew where the repository was. It's: http://www.enthought.com/python/fedora/$releasever We need someone to write installation notes for each package manager. We also need umbrella packages. 3. Document it thoroughly for both new and experienced users. Right now what we have doesn't scratch the surface. I mean no offense to those who have written what we do have. We need to update that and to write a lot more and a lot else. Janet Swisher and several others are ready to dig into this, but we're waiting for the numeric/numarray split to resolve. A list of needed documents is in the ASP proposal. 4. Focus new users on a single selection of packages. The variety of packages available to do a particular task is both a strength and a weakness. While experienced people will want choice, new users need simplicity. We will select a single package each application (like plotting), and will mainly describe those in the tutorial-level docs. We will not be afraid to change the selection of packages. You're only a new user once, so it will not affect you if we switch the docs after you've become experienced. For example, Matplotlib was selected at the SciPy04 BoF, but if Chaco ever reaches that level of new-user friendliness, we might switch. Both packages will of course always be available. Neither needs to be in the core on Linux and other systems that have package management. New users will be steered to the "starter" umbrella package, which will pull in any components that are not in the core. Enthon will continue to include all the packages in the world, I'm sure! 5. Provide a web site that is easy to use and that communicates to each client audience. We (me, Perry, Janet, Jon-Eric) were actually gearing up to solicit proposals for improving the site and making it the go-to place for all things numerical in python when Travis started his work on problem #1. This is the next step, but we're waiting for item 1 to finish so that we don't distract everyone's attention from its resolution. Many developers are interested in contributing here, too. If people feel it's time, we can begin this process. I just don't want to slow Travis and his helpers one tiny bit! 6. Catalog all the add-ons and external web sites so that scipy.org becomes the portal for all things numeric in python. This, at least, is done, thanks to Fernando Perez. See: http://www.scipy.org/wikis/topical_software/TopicalSoftware I'll add one more issue: 7. Do something so people who use SciPy, numeric, and numarray remember that these issues are being worked, and where, and how to contribute. To that end, all I can do is post periodically about ASP and encourage you to remember it whenever someone wonders why we haven't hit critical mass yet. Please visit the ASP wiki. Read the ASP proposal if you haven't, sign up to do something, and do it! Right now, a paltry 6 people have signed up to help out. http://www.scipy.org/wikis/accessible_scipy/AccessibleSciPy The ASP proposal is linked in the first paragraph of the wiki. After giving it some thought, we decided to use scipy-dev(a)scipy.net as our mailing list, to avoid cross-posted discussions on the 4 mailing lists. Please carry on any further discussion there. Thanks, --jh-- [View Less]

2 1

Re: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley
by Bruce Southey March 9, 2005

March 9, 2005

Hi, I fully agree with these comments but I think there is a user experience aspect as well. This is my little rant (if you want) as a different view because I really do appreciate the scientific python community. Please understand that these are issues that I see as problems and do not reflect any negative view of what is available. The basics of Python and numarray (and Numeric almost to the same extent) already provide what most users need, basically the implementation … [View More]of matrix algorithms. I have not tried SciPy for some time so I really will not address it. So in one sense, what more is there to achieve? :-) For a user to contribute material there are some issues that I tend to think about. As you know, it is usually easier (and quicker with Python) to write your own code than try to adapt existing code (and the bloat issue with code that is unnecessary to the user needs). The second aspect is being able to contribute that code back into a package - usually this is too hard (coding styles etc.), may not have high programming experience to be able to achieve this and may not know how to contribute it in the first place. This also gets problematic when items are passed to C or Fortran. My 'job' is not to develop packages but to get results (mainly statistics and bioinformatics). Any free time to do development is usually nonexistant (one has to write papers for example). I would guess that this is not uncommon for the scientific python users. A related issue is missing (or at least not obvious) and inflexible features. For example, I do statistics and missing (unobserved) values are a problem (cannot mix types or missing value code may actually occur). But I can use masked arrays (which really means numarray) to handle this rather nicely. I fully agree with others on directions. From a Python view, if "python setup.py install" doesn't work 'out of the box' then there are big problems. Regards Bruce ---- Original message ---- >Date: Wed, 09 Mar 2005 17:32:15 +0900 >From: Michiel Jan Laurens de Hoon <mdehoon(a)ims.u-tokyo.ac.jp> >Subject: Re: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley >To: Travis Oliphant <oliphant(a)ee.byu.edu> >Cc: SciPy Developers List <scipy-dev(a)scipy.net>, scipy-user(a)scipy.net, numpy-discussion <numpy-discussion(a)lists.sourceforge.net> > >Travis Oliphant wrote: >> It would seem that while the scipy conference demonstrates a continuing >> and even increasing use of Python for scientific computing, not as many >> of these users are scipy devotees. Why? >> >> I think the answers come down to a few issues which I will attempt to >> answer with proposals. >> >> 1) Plotting >While plotting is important, I don't think that SciPy needs to offer >plotting capabilities in order to become successful. Numerical Python >doesn't include plotting, and it's hugely popular. I would think that >installing Scipy-lite + (selection of SciPy-lib sub-packages) + (your >favorite plotting package) separately is acceptable. > >> 2) Installation problems >This is the real problem. I'm one of the maintainers of Biopython >(python and C code for computational biology), which relies on Numerical >Python. Now that Numerical Python is not being actively maintained, I'd >love to be able to direct our users to SciPy instead. But as long as >SciPy doesn't install out of the box with a python setup.py install, >it's not viable as a replacement for Numerical Python. I'd spend the >whole day dealing with installation problems from Biopython users. > >There are three other reasons why I have not become a SciPy devotee, >although I use Python for scientific computing all the time: > >3) Numerical Python already does the job very well. There are few >packages in SciPy that I actually need. Special functions would be nice, >but it's easier to write your own module than to install SciPy. > >4) SciPy looks bloated. It seems to try to do too many things, so that >it becomes impossible to maintain SciPy well. > >5) Uncertain future. With Numerical Python, we know what we get. I don't >know what SciPy will look like in a few years (numarray? Numeric3? >Numeric2?) and if it still has a trouble-free installation. So it's too >risky for Biopython to go over to SciPy. > >It's really unfortunate, because my impression is that the SciPy >developers are smart people who write good code, which currently is not >used as much as it could because of these problems. I hope my comments >will be helpful. > >--Michiel. > > >------------------------------------------------------- >SF email is sponsored by - The IT Product Guide >Read honest & candid reviews on hundreds of IT Products from real users. >Discover which products truly live up to the hype. Start reading now. >http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion(a)lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion [View Less]

1 0

Announcement: PyMatrix-0.0.1a Released
by Colin J. Williams March 8, 2005

March 8, 2005

PyMatrix is a package to provide access to the functionality of matrix algebra. This package is currently based on numarray. It includes a statistics module which includes a basic analysis of variance. In the future it is hoped to enhance the generality of the divide operation, to add the transcendental functions as methods of the matrix class and to improve the documentation. The expectation is that Numeric3 will eventually replace numarray and that this will necessitate some changes … [View More]

1 0