
I wanted to let users of the community know (so they can help if they want, or offer criticism or comments) that over the next several months I will be experimenting with a branch of the main Numerical source tree and endeavoring to "clean-up" the code for Numerical Python. I have in mind a few (in my opinion minor) alterations to the current code-base which necessitates a branch. Guido has made some good suggestions for improving the code base, and both David Ascher and Paul Dubois have expressed concerns over the current state of the source code and given suggestions as to how to improve it. That said, I should emphasize that my work is not authorized, or endorsed, by any of the people mentioned above. It is simply my little experiment. My intent is not to re-create Numerical Python --- I like most of the current functionality --- but to merely, clean-up the code, comment it, and change the underlying structure just a bit and add some features I want. One goal I have is to create something that can go into Python 1.7 at some future point, so this incarnation of Numerical Python may not be completely C-source compatible with current Numerical Python (but it will be close). This means C-extensions that access the underlying structure of the current arrayobject may need some alterations to use this experimental branch if it every becomes useful. I don't know how long this will take me. I'm not promising anything. The purpose of this announcement is just to invite interested parties into the discussion. These are the (somewhat negotiable) directions I will be pursuing. 1) Still written in C but heavily (in my opinion) commented. 2) Addition of bit-types and unsigned integer types. 3) Facility for memory-mapped dataspace in arrays. 4) Slices become copies with the addition of methods for current strict referencing behavior. 5) Handling of sliceobjects which consist of sequences of indices (so that setting and getting elements of arrays using their index is possible). 6) Rank-0 arrays will not be autoconverted to Python scalars, but will still behave as Python scalars whenever Python allows general scalar-like objects in it's operations. Methods will allow the user-controlled conversion to the Python scalars. 7) Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content. If their is anyone interested in helping in this "unofficial branch work" let me know and we'll see about setting up someplace to work. Be warned, however, that I like actual code or code-templates more than just great ideas (truly great ideas are never turned away however ;-) ) If something I do benefits the current NumPy source in a non-invasive, backwards compatible way, I will try to place it in the current CVS tree, but that won't be a priority, as my time does have limitations, and I'm scratching my own itch at this point. Best regards, Travis Oliphant

Travis says that I don't necessarily endorse his goals but in fact I do, strongly. If I understand right he intends to make a CVS branch for this experiment and that is fine with me. The only goal I didn't quite understand was: Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content. In a world of reusable components the situation is complicated. I would not like to support a dot-product routine, for example, if the user could turn off any double precision behind my back. My needs for precision are local to my algorithm.

3) Facility for memory-mapped dataspace in arrays.
I'd really like to have that...
4) Slices become copies with the addition of methods for current strict referencing behavior.
This will break a lot of code, and in a way that will be difficult to debug. In fact, this is the only point you mention which would be reason enough for me not to use your modified version; going through all of my code to check what effect this might have sounds like a nightmare. I see the point of having a copying version as well, but why not implement the copying behaviour as methods and leave indexing as it is?
5) Handling of sliceobjects which consist of sequences of indices (so that setting and getting elements of arrays using their index is possible).
Sounds good as well...
I suspect that full behaviour-compatibility with scalars is impossible, but I am willing to be proven wrong. For example, Python scalars are immutable, arrays aren't. This also means that rank-0 arrays can't be used as keys in dictionaries. How do you plan to implement mixed arithmetic with scalars? If the return value is a rank-0 array, then a single library returning a rank-0 array somewhere could mess up a program well enough that debugging becomes a nightmare.
7) Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content.
You mean global attributes? That could be the end of universally usable library modules, supposing that people actually use them.
If their is anyone interested in helping in this "unofficial branch work" let me know and we'll see about setting up someplace to work. Be
I don't have much time at the moment, but I could still help out with testing etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

3) Facility for memory-mapped dataspace in arrays.
I'd really like to have that...
This is pretty easy to add but it does require some changes to the underlying structure, So you can expect it.
I know this will be a sticky point. I'm not sure what to do exactly, but the current behavior and implementation makes the semantics for slicing an array using a sequence problematic since I don't see a way to represent a reference to a sequence of indices in the underlying structure of an array. So such slices would have to be copies and not references, which makes for an inconsistent code.
I want to agree with you, but I think we may need to change the behavior eventually so when is it going to happen?
This facility is already embedded in the underlying structure. My plan is to go with the original idea that Jim Hugunin and Chris Chase had for slice objects. The sliceobject in python is already general enough for this to work.
Mixed arithmetic in general is another sticky point. I went back and read the discussion of this point which occured 1995-1996. It was very interesting reading and a lot of points were made. Now we have several years of experience and we should apply what we've learned (of course we've all learned different things :-) ). Konrad, you had a lot to say on this point 4 years ago. I've had a long discussion with a colleague who is starting to "get in" to Numerical Python and he has really been annoyed with the current mixed arithmetic rules. The seem to try to outguess the user. The spacesaving concept helps, but it still seem's like a hack to me. I know there are several opinions, so I'll offer mine. We need simple rules that are easy to teach a newcomer. Right now the rule is farily simple in that coercion always proceeds up. But, mixed arithmetic with a float and a double does not produce something with double precision -- yet that's our rule. I think any automatic conversion should go the other way. Konrad, 4 years ago, you talked about unexpected losses of precision if this were allowed to happen, but I couldn't understand how. To me, it is unexpected to have double precision arrays which are really only carrying single-precision results. My idea of the coercion hierchy is shown below with conversion always happening down when called for. The Python scalars get mapped to the "largest precision" in their category and then normal coercions rules take place. The casual user will never use single precision arrays and so will not even notice they are there unless they request them. If they do request them, they don't want them suddenly changing precision. That is my take anyway. Boolean Character Unsigned long int short Signed long int short Real /* long double */ double float Complex /* __complex__ long double */ __complex__ double __complex__ float Object
I thought I did, but I've changed my mind after reading the discussion in 1995. I don't like global attributes either, so I'm not going there.
Konrad you were very instrumental in getting NumPy off the ground in the first place and I will always appreciate your input.

Before we all rattle on too long about precision, I'd like to point out that selecting a precision actually carries two consequences in the context of computer languages: 1. Expected: The number of digits of accuracy in the representation of a floating point number. 2. Unexpected: The range of numbers that can be represented by this type. Thus, to a scientist it is perfectly logical that if d is a double and f is a single, d * f has only single precision validity. Unfortunately in a computer if you hold this answer in a single, then it may fail if the contents of d include numbers outside the single range, even if f is 1.0. Thus the rules in C and Fortran that coercion is UP had to do as much with range as precision.

On Tue, 8 Feb 2000, Travis Oliphant wrote:
Remark: If you are consistent then you say here that mixed arithmetic with an int and a float/double produces int?! Right? (I hope that I am wrong.)
How about `/* long long */'? Is this left out intentionally?
Travis, while you are doing revision on NumPy, could you also estimate the degree of difficulty of introducing column major order arrays? Pearu

You are right there. But is it really necessary to extend the meaning of slices? Of course everyone wants the functionality of indexing with a sequence, but I'd be perfectly happy to have it implemented as a method. Indexing would remain as it is (by reference), and a new method would provide copying behaviour for element extraction and also permit more generalized sequence indices. In addition to backwards compatibility, there is another argument for keeping indexing behaviour as it is: compatibility with other Python sequence types. If you have a list of lists, which in many ways behaves like a 2D array, and extract the third element (which is thus a list), then this data is shared with the full nested list.
What I meant was not mixed-precision arithmetic, but arithmetic in which one operand is a scalar and the other one a rank-0 array. Which reminds me: rank-0 arrays are also incompatible with the nested-list view of arrays. The elements of a list of numbers are numbers, not number-like sequence objects. But back to precision, which is also a popular subject:
I wouldn't say that the current system tries to outguess the user. It simply gives precision a higher priority than memory space. That might not coincide with what a particular user wants, but it is consistent and easy to understand.
Like in Python (for scalars), C, Fortran, and all other languages that I can think of.
I think this is a confusion of two different meanings of "precision". In numerical algorithms, precision refers to the deviation between an ideal and a real numerical value. In programming languages, it refers to the *maximum* precision that can be stored in a given data type (and is in fact often combined with a difference in range). The upcasting rule thus ensures that 1) No precision is lost accidentally. If you multiply a float by a double, the float might contain the exact number 2, and thus have infinite precision. The language can't know this, so it acts conservatively and chooses the "bigger" type. 2) No overflow occurs unless it is unavoidable (the range problem).
The casual user will never use single precision arrays and so will not even notice they are there unless they request them. If they do request
There are many ways in which single-precision arrays can creep into a program without a user's attention. Suppose you send me some data in a pickled array, which happens to be single-precision. Or I call a library routine that does some internal calculation on huge data arrays, which it keeps at single precision, and (intentionally or by error) returns a single-precision result. I think your "active flag" solution is a rather good solution to the casting problem, because it gives access to a different behaviour in a very explicit way. So unless future experience points out problems, I'd propose to keep it. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Good point. So we can't be consistent with all properties of other Python sequence types. Which reminds me of some very different compatibility problem in NumPy that can (and should) be removed: the rules for integer division and remainders for negative arguments are not the same. NumPy inherits the C rules, Python has its own. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

I think I can live with that, as long as it _syntactically_ looks like indexing. This is one case where the syntax is more important than functionality. There are things you want to index with indices, etc., and the composition with parenthesis-like (Dyck language) syntax has proved to be one of the few readable ways to do it.
_Avoiding_ data sharing will eventually be more important that supporting data sharing since memory continues to get cheaper but memory bandwidth and latency do not improve at the same rate. Locality of reference is hard to control when there is a lot of default data sharing, and performance suffers, yet it becomes important on more and more scales as memory systems become more and more hierarchical. Ultimately, the _semantics_ we like will be implemented efficiently by emulating references and copies in code which copies and references as it sees fit and keeps track of which copies are "really" references and which references are really "copies". I've thought this through for the "everything gets copied" languages and it isn't too mentally distressing - you simply reference count fake copies. The "everything is a reference" languages are less clean, but the database people have confronted that problem.
Which reminds me: rank-0 arrays are also incompatible with the nested-list view of arrays.
There are ways out of that trap. Most post-ISO APLs provide examples of how to cope.
And that is not a bad thing. But which way is "up"? (See example below.)
Most people always hate, and only sometimes detect, when that happens. It specifically contravenes the Geneva conventions on programming mental hygiene.
The upcasting rule thus ensures that
1) No precision is lost accidentally.
More or less. More precisely, it depends on what you call an accident. What happens when you add the IEEE single precision floating point value 1.0 to the 32-bit integer 2^30? A _lot_ of people don't expect to get the IEEE single precision floating point value 2.0^30, but that is what happens in some languages. Is that an "upcast"? Would the 32 bit integer 2^30 make more sense? Now what about the case where the 32 bit integer is signed and adding one to it will "wrap around" if the value remains an integer? Because these two examples might make double precision or a wider integer (if available) seem the correct answer, suppose it's only one element of a gigantic array? Let's now talk about complex values.... There's plenty of rough edges like this when you mix numerical types. It's guaranteed that everybody's ox will get gored somewhere.
Absolutely.
And the worst one is when the accuracy of the result is single precision, but the _type_ of the result is double precision. There is a function in S-plus which does this (without documenting it, of course) and man was that a pain in the neck to sort out. Today, I just found another bug in one of the S-plus functions - turns out that that if you hand a complex triangular matrix and a real right hand side to the triangular solver (backsubstitution) it doesn't cast the right hand side to complex and uses whatever values are subsequent in memoty to the right hand side as if they were part of the vector. Obviously, when testing the function, they didn't try this mixed type case. But interpreters are really convenient for writing code so that you _don't_ have to think about types all the time and do your own casting. Which is why stubbing your head on an unexpected cast is so unlooked for.
Is there a simple way to ensure that no active arrays are ever activated at any time when I use Numerical Python? Later, Andrew Mullhaupt

Konrad Hinsen wrote:
But back to precision, which is also a popular subject:
but one which even numerical programmers don't seem to understand ...
.. which is all wrong. It is NOT safe to convert floating point from a lower to a higher number of bits. ALL such conversions should be removed for this reason: any conversions should have to be explicit. The reason is that whether a conversion to a larger number is safe or not is context dependent (and so it should NEVER be done silently). Consider a function k0 = 100 k = 99 while k < k0: .. k0 = k k = ... which refines a calculation until the measure k stops decreasing. This algorithm may terminate when k is a float, but _fail_ when k is a double -- the extra precision may cause the algorithm to perform many useless iterations, in which the precision of the result is in fact _lost_ due to rounding error. What is happening is that the real comparison is probably: k - k0 < epsilon where epsilon was 0.0 in floating point, and thus omitted. My point is that throwing away information is what numerical programming is all about. Numerical programmers need to know how big numbers are, and how much significance they have, and optimise calculations accordingly -- sometimes by _using_ the precision of the working types to advantage. to put this another way, it is generally bad to keep more digits (bits) or precision than you actually have: it can be misleading. So a language should not assume that it is OK to add more precision. It may not be. -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 homepage: http://www.maxtal.com.au/~skaller download: ftp://ftp.cs.usyd.edu/au/jskaller

I'd call this a buggy implementation. Convergence criteria should be explicit and not rely on the internal representation of data types. Neither Python nor C guarantees you any absolute bounds for precision and value range, and even languages that do (such as Fortran 9x) only promise to give you a data type that is *at least* as big as your specification.
If you care at all about portability, you shouldn't even think about this. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Konrad Hinsen wrote:
If you care at all about portability, you shouldn't even think about this.
But sometimes you DON'T care about portability. Sometimes, you want the best result the architecture can support, and so you need to perform a portable computation of an architecture dependent value. -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 homepage: http://www.maxtal.com.au/~skaller download: ftp://ftp.cs.usyd.edu/au/jskaller

Some do, some don't.
It is usually safe. Extremely safe. Safe enough that code in which it is _not_ safe is badly designed.
ALL such conversions should be removed for this reason: any conversions should have to be explicit.
I really hope not. A generic function with six different arguments becomes an interesting object in a language without automatic conversions. Usually, a little table driven piece of code has to cast the arguments into conformance, and then multiple versions of similar code are applied.
This is a classic bad programming practice and _it_ is what should be eliminated. It is a good, (and required, if you work for me), practice that: 1. All iterations should have termination conditions which are correct; that is, prevent extra iterations. This is typically precision sensitive. But that is simply something that has to be taken into account when writing the termination condition. 2. All iterations should be protected against an unexpectedly large number of iterates taking place. There are examples of iterations which are intrinsically stable in lower precision and not in higher precision (Brun's algorithm) but those are quite rare in practice. (Note that the Fergueson-Forcade algorithm, as implemented by Lenstra, Odlyzko, and others, has completely supplanted any need to use Brun's algorithm as well.) When an algorithm converges because of lack of precision, it is because the rounding error regularizes the problem. This is normally referred to in the trade as "idiot regularization". It is in my experience, invariably better to actually choose a regularization that is specific to the computation than to rely on rounding effects which might be different from machine to machine. In particular, your type of example is in for serious programmer enjoyment hours on Intel or AMD machines, which have 80 bit wide registers for all the floating point arithmetic. Supporting needless machine dependency is not something to argue for, either, since the Cray style floating point arithmetic has a bad error model. Even Cray has been beaten into submission on this, finally releasing IEEE compliant processors, but only just recently.
to put this another way, it is generally bad to keep more digits (bits) or precision than you actually have
I couldn't agree less. The exponential function and inner product accumulation are famous examples of why extra bits are important in intermediate computations. It's almost impossible to have an accurate exponential function without using extra precision - which is one reason why so many machines have extra bits in their FPUs and there is an IEEE "extended" precision type. The storage history effects which result from temporarily increased precision are well understood, mild in that they violate no common error models used in numerical analysis. And for those few cases where testing for equality is needed for debugging purposes, many systems permit you to impose truncation and eliminate storage history issues. Later, Andrew Mullhaupt

3) Facility for memory-mapped dataspace in arrays.
I'd really like to have that...
4) Slices become copies with the addition of methods for current strict referencing behavior.
This will break a lot of code, and in a way that will be difficult to debug. In fact, this is the only point you mention which would be reason enough for me not to use your modified version; going through all of my code to check what effect this might have sounds like a nightmare. I see the point of having a copying version as well, but why not implement the copying behaviour as methods and leave indexing as it is?
5) Handling of sliceobjects which consist of sequences of indices (so that setting and getting elements of arrays using their index is possible).
Sounds good as well...
I suspect that full behaviour-compatibility with scalars is impossible, but I am willing to be proven wrong. For example, Python scalars are immutable, arrays aren't. This also means that rank-0 arrays can't be used as keys in dictionaries. How do you plan to implement mixed arithmetic with scalars? If the return value is a rank-0 array, then a single library returning a rank-0 array somewhere could mess up a program well enough that debugging becomes a nightmare.
7) Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content.
You mean global attributes? That could be the end of universally usable library modules, supposing that people actually use them.
If their is anyone interested in helping in this "unofficial branch work" let me know and we'll see about setting up someplace to work. Be
I don't have much time at the moment, but I could still help out with testing etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Travis Oliphant writes:
3) Facility for memory-mapped dataspace in arrays.
For the NumPy users who are as ignorant about mmap, msync, and madvise as I am, I've put a couple of documents on my web site: 1) http://www.geog.ubc.ca/~phil/mmap/mmap.pdf A pdf version of Kevin Sheehan's paper: "Why aren't you using mmap yet?" (19 page Frame postscript orginal, page order back to front). He gives a good discusion of the SV4 VM model, with some mmap examples in C. 2) http://www.geog.ubc.ca/~phil/mmap/threads.html An archived email exchange (initially on the F90 mailing list) between Kevin (who is an independent Solaris consultant) and Brian Sumner (SGI) about the pros and cons of using mmap. Executive summary: i) mmap on Solaris can be a very big win (see bottom of http://www.geog.ubc.ca/~phil/mmap/msg00003.html) when used in combination with WILLNEED/WONTNEED madvise calls to guide the page prefetching. ii) IRIX and some other Unices (Linux 2.2 in particular), haven't implemented madvise, and naive use of mmap without madvise can produce lots of page faulting and much slower io than, say, asynchronous io calls on IRIX. (http://www.geog.ubc.ca/~phil/mmap/msg00009.html) So I'd love to see mmap in Numpy, but we may need to produce a tutorial outlining the tradeoffs, and giving some examples of madvise/msync/mmap used together (with a few benchmarks). Any mmap module would need to include member functions that call madvise/msync for the mmapped array (but these may be no-ops on several popular OSes.) Regards, Phil

I have Kevin's "Why Aren't You Using mmap() Yet?" on my site. Kevin is working on a new (11th anniversary edition? 1xth anniversary edition?). By the way, Uresh Vahalia's book on Unix Internals is a very good idea for anyone not yet familiar with modern operating systems, especially Unices. Kevin is extremely knowledgable on this subject, and several others.
Executive summary:
i) mmap on Solaris can be a very big win
Orders of magnitude.
And with the newer versions of Solaris, madvise() is a good way to go. madvise is _not_ SVR4 (not in SVID3) but it _is_ in the OSF/1 AES which means it is _not_ vendor specific. But the standard part of madvise is that it is a "hint". However everything it actually _does_ when you hint the kernel with madvise is specific usually to some versions of an operating system. There are tricks to get around madvise not doing everything you want (WONTNEED didn't work in Solaris for a long time. Kevin found a trick that worked really well instead. Kevin knows people at Sun, since he was one of the very earliest employees there, and so now the trick Kevin used to suggest has now been found to be the implementation of WONTNEED in Solaris.) And that trick is well worth understanding. It happens that msync() is a good call to know. It has an undocumented behavior on Solaris that when you msync a memory region with MS_INVALIDATE | MS_ASYNC, what happens is the dirty pages are queued for writing and backing store is available immediately, or if dirty, as soon as written out. This means that the pager doesn't have to run at all to scavenge the pages. Linux didn't do this last time I looked. I suggested it to the kernel guys and the idea got some positive response, but I don't know if they did it.
IRIX has an awful implementation of mmap. And SGI people go around badmouthing mmap; not that they don't have cause, but they are usually very surprised to see how big the win is with a good implementation. Of course, the msync() trick doesn't work on IRIX last I looked, which leads to the SGI people believing that mmap() is brain damaged because it runs the pager into the ground. It's a point of view that is bound to come up. HP/UX was really wacked last time I looked. They had a version (10) which supported the full mmap() on one series of workstations (700, 7000, I forget, let's say 7e+?) and didn't support it except in the non-useful SVR3.2 way on another series of workstations (8e+?). The reason was that the 8e+? workstations were multiprocessor and they hadn't figured out how to get the newer kernel flying on the multiprocessors. I know Konrad had HP systems at one point, maybe he has the scoop on those.
I don't know if you want a separate module; maybe what you want is the normal allocation of memory for all Numerical Python objects to be handled in a way that makes sense for each operating system. The approach I took when I was writing portable code for this sort of thing was to write a wrapper for the memory operation semantics and then implement the operations as a small library that would be OS specific, although not _that_ specific. It was possible to write single source code for SVID3 and OSF/AES1 systems with sparing use of conditional defines. Unfortunately, that code is the intellectual property of another firm, or else I'd donate it as an example for people who want to learn stuff about mmap. As it stands, there was some similar code I was able to produce at some point. I forget who here has a copy, maybe Konrad, maybe David Ascher. Later, Andrew Mullhaupt

With Travis' wise advice, I appear to have succeeded in putting forth a binary installation of Numerical-15.2. Due to a bug in distutils, this is an 'install in place' package, instead of a 'run python setup.py install' package. So, unzip the file in your main Python tree, and it should 'work'. Let me (and Paul and Travis) know if it doesn't. Download is available from the main page (http://sourceforge.net/project/?group_id=1369 look for [zip]) or from http://download.sourceforge.net/numpy/python-numpy-15.2.zip --david ascher

Travis says that I don't necessarily endorse his goals but in fact I do, strongly. If I understand right he intends to make a CVS branch for this experiment and that is fine with me. The only goal I didn't quite understand was: Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content. In a world of reusable components the situation is complicated. I would not like to support a dot-product routine, for example, if the user could turn off any double precision behind my back. My needs for precision are local to my algorithm.

3) Facility for memory-mapped dataspace in arrays.
I'd really like to have that...
4) Slices become copies with the addition of methods for current strict referencing behavior.
This will break a lot of code, and in a way that will be difficult to debug. In fact, this is the only point you mention which would be reason enough for me not to use your modified version; going through all of my code to check what effect this might have sounds like a nightmare. I see the point of having a copying version as well, but why not implement the copying behaviour as methods and leave indexing as it is?
5) Handling of sliceobjects which consist of sequences of indices (so that setting and getting elements of arrays using their index is possible).
Sounds good as well...
I suspect that full behaviour-compatibility with scalars is impossible, but I am willing to be proven wrong. For example, Python scalars are immutable, arrays aren't. This also means that rank-0 arrays can't be used as keys in dictionaries. How do you plan to implement mixed arithmetic with scalars? If the return value is a rank-0 array, then a single library returning a rank-0 array somewhere could mess up a program well enough that debugging becomes a nightmare.
7) Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content.
You mean global attributes? That could be the end of universally usable library modules, supposing that people actually use them.
If their is anyone interested in helping in this "unofficial branch work" let me know and we'll see about setting up someplace to work. Be
I don't have much time at the moment, but I could still help out with testing etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

3) Facility for memory-mapped dataspace in arrays.
I'd really like to have that...
This is pretty easy to add but it does require some changes to the underlying structure, So you can expect it.
I know this will be a sticky point. I'm not sure what to do exactly, but the current behavior and implementation makes the semantics for slicing an array using a sequence problematic since I don't see a way to represent a reference to a sequence of indices in the underlying structure of an array. So such slices would have to be copies and not references, which makes for an inconsistent code.
I want to agree with you, but I think we may need to change the behavior eventually so when is it going to happen?
This facility is already embedded in the underlying structure. My plan is to go with the original idea that Jim Hugunin and Chris Chase had for slice objects. The sliceobject in python is already general enough for this to work.
Mixed arithmetic in general is another sticky point. I went back and read the discussion of this point which occured 1995-1996. It was very interesting reading and a lot of points were made. Now we have several years of experience and we should apply what we've learned (of course we've all learned different things :-) ). Konrad, you had a lot to say on this point 4 years ago. I've had a long discussion with a colleague who is starting to "get in" to Numerical Python and he has really been annoyed with the current mixed arithmetic rules. The seem to try to outguess the user. The spacesaving concept helps, but it still seem's like a hack to me. I know there are several opinions, so I'll offer mine. We need simple rules that are easy to teach a newcomer. Right now the rule is farily simple in that coercion always proceeds up. But, mixed arithmetic with a float and a double does not produce something with double precision -- yet that's our rule. I think any automatic conversion should go the other way. Konrad, 4 years ago, you talked about unexpected losses of precision if this were allowed to happen, but I couldn't understand how. To me, it is unexpected to have double precision arrays which are really only carrying single-precision results. My idea of the coercion hierchy is shown below with conversion always happening down when called for. The Python scalars get mapped to the "largest precision" in their category and then normal coercions rules take place. The casual user will never use single precision arrays and so will not even notice they are there unless they request them. If they do request them, they don't want them suddenly changing precision. That is my take anyway. Boolean Character Unsigned long int short Signed long int short Real /* long double */ double float Complex /* __complex__ long double */ __complex__ double __complex__ float Object
I thought I did, but I've changed my mind after reading the discussion in 1995. I don't like global attributes either, so I'm not going there.
Konrad you were very instrumental in getting NumPy off the ground in the first place and I will always appreciate your input.

Before we all rattle on too long about precision, I'd like to point out that selecting a precision actually carries two consequences in the context of computer languages: 1. Expected: The number of digits of accuracy in the representation of a floating point number. 2. Unexpected: The range of numbers that can be represented by this type. Thus, to a scientist it is perfectly logical that if d is a double and f is a single, d * f has only single precision validity. Unfortunately in a computer if you hold this answer in a single, then it may fail if the contents of d include numbers outside the single range, even if f is 1.0. Thus the rules in C and Fortran that coercion is UP had to do as much with range as precision.

On Tue, 8 Feb 2000, Travis Oliphant wrote:
Remark: If you are consistent then you say here that mixed arithmetic with an int and a float/double produces int?! Right? (I hope that I am wrong.)
How about `/* long long */'? Is this left out intentionally?
Travis, while you are doing revision on NumPy, could you also estimate the degree of difficulty of introducing column major order arrays? Pearu

You are right there. But is it really necessary to extend the meaning of slices? Of course everyone wants the functionality of indexing with a sequence, but I'd be perfectly happy to have it implemented as a method. Indexing would remain as it is (by reference), and a new method would provide copying behaviour for element extraction and also permit more generalized sequence indices. In addition to backwards compatibility, there is another argument for keeping indexing behaviour as it is: compatibility with other Python sequence types. If you have a list of lists, which in many ways behaves like a 2D array, and extract the third element (which is thus a list), then this data is shared with the full nested list.
What I meant was not mixed-precision arithmetic, but arithmetic in which one operand is a scalar and the other one a rank-0 array. Which reminds me: rank-0 arrays are also incompatible with the nested-list view of arrays. The elements of a list of numbers are numbers, not number-like sequence objects. But back to precision, which is also a popular subject:
I wouldn't say that the current system tries to outguess the user. It simply gives precision a higher priority than memory space. That might not coincide with what a particular user wants, but it is consistent and easy to understand.
Like in Python (for scalars), C, Fortran, and all other languages that I can think of.
I think this is a confusion of two different meanings of "precision". In numerical algorithms, precision refers to the deviation between an ideal and a real numerical value. In programming languages, it refers to the *maximum* precision that can be stored in a given data type (and is in fact often combined with a difference in range). The upcasting rule thus ensures that 1) No precision is lost accidentally. If you multiply a float by a double, the float might contain the exact number 2, and thus have infinite precision. The language can't know this, so it acts conservatively and chooses the "bigger" type. 2) No overflow occurs unless it is unavoidable (the range problem).
The casual user will never use single precision arrays and so will not even notice they are there unless they request them. If they do request
There are many ways in which single-precision arrays can creep into a program without a user's attention. Suppose you send me some data in a pickled array, which happens to be single-precision. Or I call a library routine that does some internal calculation on huge data arrays, which it keeps at single precision, and (intentionally or by error) returns a single-precision result. I think your "active flag" solution is a rather good solution to the casting problem, because it gives access to a different behaviour in a very explicit way. So unless future experience points out problems, I'd propose to keep it. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Good point. So we can't be consistent with all properties of other Python sequence types. Which reminds me of some very different compatibility problem in NumPy that can (and should) be removed: the rules for integer division and remainders for negative arguments are not the same. NumPy inherits the C rules, Python has its own. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

I think I can live with that, as long as it _syntactically_ looks like indexing. This is one case where the syntax is more important than functionality. There are things you want to index with indices, etc., and the composition with parenthesis-like (Dyck language) syntax has proved to be one of the few readable ways to do it.
_Avoiding_ data sharing will eventually be more important that supporting data sharing since memory continues to get cheaper but memory bandwidth and latency do not improve at the same rate. Locality of reference is hard to control when there is a lot of default data sharing, and performance suffers, yet it becomes important on more and more scales as memory systems become more and more hierarchical. Ultimately, the _semantics_ we like will be implemented efficiently by emulating references and copies in code which copies and references as it sees fit and keeps track of which copies are "really" references and which references are really "copies". I've thought this through for the "everything gets copied" languages and it isn't too mentally distressing - you simply reference count fake copies. The "everything is a reference" languages are less clean, but the database people have confronted that problem.
Which reminds me: rank-0 arrays are also incompatible with the nested-list view of arrays.
There are ways out of that trap. Most post-ISO APLs provide examples of how to cope.
And that is not a bad thing. But which way is "up"? (See example below.)
Most people always hate, and only sometimes detect, when that happens. It specifically contravenes the Geneva conventions on programming mental hygiene.
The upcasting rule thus ensures that
1) No precision is lost accidentally.
More or less. More precisely, it depends on what you call an accident. What happens when you add the IEEE single precision floating point value 1.0 to the 32-bit integer 2^30? A _lot_ of people don't expect to get the IEEE single precision floating point value 2.0^30, but that is what happens in some languages. Is that an "upcast"? Would the 32 bit integer 2^30 make more sense? Now what about the case where the 32 bit integer is signed and adding one to it will "wrap around" if the value remains an integer? Because these two examples might make double precision or a wider integer (if available) seem the correct answer, suppose it's only one element of a gigantic array? Let's now talk about complex values.... There's plenty of rough edges like this when you mix numerical types. It's guaranteed that everybody's ox will get gored somewhere.
Absolutely.
And the worst one is when the accuracy of the result is single precision, but the _type_ of the result is double precision. There is a function in S-plus which does this (without documenting it, of course) and man was that a pain in the neck to sort out. Today, I just found another bug in one of the S-plus functions - turns out that that if you hand a complex triangular matrix and a real right hand side to the triangular solver (backsubstitution) it doesn't cast the right hand side to complex and uses whatever values are subsequent in memoty to the right hand side as if they were part of the vector. Obviously, when testing the function, they didn't try this mixed type case. But interpreters are really convenient for writing code so that you _don't_ have to think about types all the time and do your own casting. Which is why stubbing your head on an unexpected cast is so unlooked for.
Is there a simple way to ensure that no active arrays are ever activated at any time when I use Numerical Python? Later, Andrew Mullhaupt

Konrad Hinsen wrote:
But back to precision, which is also a popular subject:
but one which even numerical programmers don't seem to understand ...
.. which is all wrong. It is NOT safe to convert floating point from a lower to a higher number of bits. ALL such conversions should be removed for this reason: any conversions should have to be explicit. The reason is that whether a conversion to a larger number is safe or not is context dependent (and so it should NEVER be done silently). Consider a function k0 = 100 k = 99 while k < k0: .. k0 = k k = ... which refines a calculation until the measure k stops decreasing. This algorithm may terminate when k is a float, but _fail_ when k is a double -- the extra precision may cause the algorithm to perform many useless iterations, in which the precision of the result is in fact _lost_ due to rounding error. What is happening is that the real comparison is probably: k - k0 < epsilon where epsilon was 0.0 in floating point, and thus omitted. My point is that throwing away information is what numerical programming is all about. Numerical programmers need to know how big numbers are, and how much significance they have, and optimise calculations accordingly -- sometimes by _using_ the precision of the working types to advantage. to put this another way, it is generally bad to keep more digits (bits) or precision than you actually have: it can be misleading. So a language should not assume that it is OK to add more precision. It may not be. -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 homepage: http://www.maxtal.com.au/~skaller download: ftp://ftp.cs.usyd.edu/au/jskaller

I'd call this a buggy implementation. Convergence criteria should be explicit and not rely on the internal representation of data types. Neither Python nor C guarantees you any absolute bounds for precision and value range, and even languages that do (such as Fortran 9x) only promise to give you a data type that is *at least* as big as your specification.
If you care at all about portability, you shouldn't even think about this. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Konrad Hinsen wrote:
If you care at all about portability, you shouldn't even think about this.
But sometimes you DON'T care about portability. Sometimes, you want the best result the architecture can support, and so you need to perform a portable computation of an architecture dependent value. -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 homepage: http://www.maxtal.com.au/~skaller download: ftp://ftp.cs.usyd.edu/au/jskaller

Some do, some don't.
It is usually safe. Extremely safe. Safe enough that code in which it is _not_ safe is badly designed.
ALL such conversions should be removed for this reason: any conversions should have to be explicit.
I really hope not. A generic function with six different arguments becomes an interesting object in a language without automatic conversions. Usually, a little table driven piece of code has to cast the arguments into conformance, and then multiple versions of similar code are applied.
This is a classic bad programming practice and _it_ is what should be eliminated. It is a good, (and required, if you work for me), practice that: 1. All iterations should have termination conditions which are correct; that is, prevent extra iterations. This is typically precision sensitive. But that is simply something that has to be taken into account when writing the termination condition. 2. All iterations should be protected against an unexpectedly large number of iterates taking place. There are examples of iterations which are intrinsically stable in lower precision and not in higher precision (Brun's algorithm) but those are quite rare in practice. (Note that the Fergueson-Forcade algorithm, as implemented by Lenstra, Odlyzko, and others, has completely supplanted any need to use Brun's algorithm as well.) When an algorithm converges because of lack of precision, it is because the rounding error regularizes the problem. This is normally referred to in the trade as "idiot regularization". It is in my experience, invariably better to actually choose a regularization that is specific to the computation than to rely on rounding effects which might be different from machine to machine. In particular, your type of example is in for serious programmer enjoyment hours on Intel or AMD machines, which have 80 bit wide registers for all the floating point arithmetic. Supporting needless machine dependency is not something to argue for, either, since the Cray style floating point arithmetic has a bad error model. Even Cray has been beaten into submission on this, finally releasing IEEE compliant processors, but only just recently.
to put this another way, it is generally bad to keep more digits (bits) or precision than you actually have
I couldn't agree less. The exponential function and inner product accumulation are famous examples of why extra bits are important in intermediate computations. It's almost impossible to have an accurate exponential function without using extra precision - which is one reason why so many machines have extra bits in their FPUs and there is an IEEE "extended" precision type. The storage history effects which result from temporarily increased precision are well understood, mild in that they violate no common error models used in numerical analysis. And for those few cases where testing for equality is needed for debugging purposes, many systems permit you to impose truncation and eliminate storage history issues. Later, Andrew Mullhaupt

3) Facility for memory-mapped dataspace in arrays.
I'd really like to have that...
4) Slices become copies with the addition of methods for current strict referencing behavior.
This will break a lot of code, and in a way that will be difficult to debug. In fact, this is the only point you mention which would be reason enough for me not to use your modified version; going through all of my code to check what effect this might have sounds like a nightmare. I see the point of having a copying version as well, but why not implement the copying behaviour as methods and leave indexing as it is?
5) Handling of sliceobjects which consist of sequences of indices (so that setting and getting elements of arrays using their index is possible).
Sounds good as well...
I suspect that full behaviour-compatibility with scalars is impossible, but I am willing to be proven wrong. For example, Python scalars are immutable, arrays aren't. This also means that rank-0 arrays can't be used as keys in dictionaries. How do you plan to implement mixed arithmetic with scalars? If the return value is a rank-0 array, then a single library returning a rank-0 array somewhere could mess up a program well enough that debugging becomes a nightmare.
7) Addition of attributes so that different users can configure aspects of the math behavior, to their hearts content.
You mean global attributes? That could be the end of universally usable library modules, supposing that people actually use them.
If their is anyone interested in helping in this "unofficial branch work" let me know and we'll see about setting up someplace to work. Be
I don't have much time at the moment, but I could still help out with testing etc. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Travis Oliphant writes:
3) Facility for memory-mapped dataspace in arrays.
For the NumPy users who are as ignorant about mmap, msync, and madvise as I am, I've put a couple of documents on my web site: 1) http://www.geog.ubc.ca/~phil/mmap/mmap.pdf A pdf version of Kevin Sheehan's paper: "Why aren't you using mmap yet?" (19 page Frame postscript orginal, page order back to front). He gives a good discusion of the SV4 VM model, with some mmap examples in C. 2) http://www.geog.ubc.ca/~phil/mmap/threads.html An archived email exchange (initially on the F90 mailing list) between Kevin (who is an independent Solaris consultant) and Brian Sumner (SGI) about the pros and cons of using mmap. Executive summary: i) mmap on Solaris can be a very big win (see bottom of http://www.geog.ubc.ca/~phil/mmap/msg00003.html) when used in combination with WILLNEED/WONTNEED madvise calls to guide the page prefetching. ii) IRIX and some other Unices (Linux 2.2 in particular), haven't implemented madvise, and naive use of mmap without madvise can produce lots of page faulting and much slower io than, say, asynchronous io calls on IRIX. (http://www.geog.ubc.ca/~phil/mmap/msg00009.html) So I'd love to see mmap in Numpy, but we may need to produce a tutorial outlining the tradeoffs, and giving some examples of madvise/msync/mmap used together (with a few benchmarks). Any mmap module would need to include member functions that call madvise/msync for the mmapped array (but these may be no-ops on several popular OSes.) Regards, Phil

I have Kevin's "Why Aren't You Using mmap() Yet?" on my site. Kevin is working on a new (11th anniversary edition? 1xth anniversary edition?). By the way, Uresh Vahalia's book on Unix Internals is a very good idea for anyone not yet familiar with modern operating systems, especially Unices. Kevin is extremely knowledgable on this subject, and several others.
Executive summary:
i) mmap on Solaris can be a very big win
Orders of magnitude.
And with the newer versions of Solaris, madvise() is a good way to go. madvise is _not_ SVR4 (not in SVID3) but it _is_ in the OSF/1 AES which means it is _not_ vendor specific. But the standard part of madvise is that it is a "hint". However everything it actually _does_ when you hint the kernel with madvise is specific usually to some versions of an operating system. There are tricks to get around madvise not doing everything you want (WONTNEED didn't work in Solaris for a long time. Kevin found a trick that worked really well instead. Kevin knows people at Sun, since he was one of the very earliest employees there, and so now the trick Kevin used to suggest has now been found to be the implementation of WONTNEED in Solaris.) And that trick is well worth understanding. It happens that msync() is a good call to know. It has an undocumented behavior on Solaris that when you msync a memory region with MS_INVALIDATE | MS_ASYNC, what happens is the dirty pages are queued for writing and backing store is available immediately, or if dirty, as soon as written out. This means that the pager doesn't have to run at all to scavenge the pages. Linux didn't do this last time I looked. I suggested it to the kernel guys and the idea got some positive response, but I don't know if they did it.
IRIX has an awful implementation of mmap. And SGI people go around badmouthing mmap; not that they don't have cause, but they are usually very surprised to see how big the win is with a good implementation. Of course, the msync() trick doesn't work on IRIX last I looked, which leads to the SGI people believing that mmap() is brain damaged because it runs the pager into the ground. It's a point of view that is bound to come up. HP/UX was really wacked last time I looked. They had a version (10) which supported the full mmap() on one series of workstations (700, 7000, I forget, let's say 7e+?) and didn't support it except in the non-useful SVR3.2 way on another series of workstations (8e+?). The reason was that the 8e+? workstations were multiprocessor and they hadn't figured out how to get the newer kernel flying on the multiprocessors. I know Konrad had HP systems at one point, maybe he has the scoop on those.
I don't know if you want a separate module; maybe what you want is the normal allocation of memory for all Numerical Python objects to be handled in a way that makes sense for each operating system. The approach I took when I was writing portable code for this sort of thing was to write a wrapper for the memory operation semantics and then implement the operations as a small library that would be OS specific, although not _that_ specific. It was possible to write single source code for SVID3 and OSF/AES1 systems with sparing use of conditional defines. Unfortunately, that code is the intellectual property of another firm, or else I'd donate it as an example for people who want to learn stuff about mmap. As it stands, there was some similar code I was able to produce at some point. I forget who here has a copy, maybe Konrad, maybe David Ascher. Later, Andrew Mullhaupt

With Travis' wise advice, I appear to have succeeded in putting forth a binary installation of Numerical-15.2. Due to a bug in distutils, this is an 'install in place' package, instead of a 'run python setup.py install' package. So, unzip the file in your main Python tree, and it should 'work'. Let me (and Paul and Travis) know if it doesn't. Download is available from the main page (http://sourceforge.net/project/?group_id=1369 look for [zip]) or from http://download.sourceforge.net/numpy/python-numpy-15.2.zip --david ascher
participants (10)
-
Andrew P. Mullhaupt
-
David Ascher
-
hinsen@dirac.cnrs-orleans.fr
-
Konrad Hinsen
-
Paul F. Dubois
-
Paul F. Dubois
-
Pearu Peterson
-
Phil Austin
-
skaller
-
Travis Oliphant