
Does anyone have a simple method of using one-offset arrays? Specifically, can I define an array "a" so that a[1] refers to the first element?
Inherit from the UserArray and redefine slicing to your hearts content.
Without one-offset indexing, it seems to me that Python is minimally useful for numerical computations. Many, perhaps the majority, of numerical algorithms are one-indexed. Matlab for example is one-based for this reason. In fact it seems strange to me that a "high-level" language like Python should use zero-offset lists.
Wow, that is a pretty condescending-sounding statement, though I'm sure you didn't mean it that way. Python is indeed being used for quite serious numerical computations. I have been using Python for quite some time for Numerical work and find it's zero-based indexing combined with the leave-last-one-out slicing notation to be much more convenient.
Oops, what I really need is a wet-ware (i.e. brain) macro which enforces the correct order of the pair (think,write,send)! The above unconsidered comments arose from the sequence (frustration,write,send,think,regret). ;) Obviously there is a lot of numerical work being done with python and people are very happy with it. But for me I still think it would be "minimally useful" without 1-indexed arrays. Here's why: In my view, the most important reason to prefer 1-based indexing versus 0-based indexing is compatibility. For numerical work, some of the languages which I use or have used are Matlab, Mathematica, Maple and Fortran. These are all 1-indexed. (C is by nature 0-indexed because it is so close to machine architecture, but with a little bit of not-entirely-clean pointer manipulation, you can easily make 1-indexed arrays and matrices.) Obviously Python can't be compatible with these languages in a strict sense, but like most people who do some programming work, I've built up a library of my own commonly used routines specific to my work; in general it's a trivial matter to translate numerical routines from one language to the another if translation is just a matter of substituting of one set of syntactical symbols and function names for anther. However it can be damn tricky to convert 1-indexed code to 0-indexed code or visa versa without introducing any errors- believe me! (Yes it's possible to call nearly any language from nearly any other language these days so in theory you don't have to recode, but there are lots of reasons why often recoding is the preferable route.) The other reason for choosing 1-based indexing is to keep the code as near to the standard notation as possible. This is one of the advantages of using a high level language - it approximates the way you think about things instead of the way the computer organizes them. Of course, this can go either way depending on the quantity in question: as a simple example a spatial vector (x,y,z) is conventionally labelled 1,2,3 (1-indexed), but a relativistic four-vector with time included (t,x,y,z) is conventionally labelled 0,1,2,3 (0-indexed). So ideally one would be able to choose the indexing-type case-by-case. I'm sure that computer programmers will argue vehemently that code which mixes both 0-indexed and 1-indexed arrays is hard to understand and maintain, but for experts in a particular field who are accustomed to certain ingrained notations, it is the code which breaks the conventional notation which is most error-prone. In my case, I'm dealing at the moment with crystal structures with which are associated certain conventional sets of vectors and tensors - all 1-indexed by convention. I find it a delicate enough task to always get the correct vector or tensor without having to remember that d[2] is actually d3. Defining d1,d2,d3 is not convenient because often the index itself needs to be calculated. I guess if I understood the reason for 0-indexed lists and tuples in Python I would be happier. In normal, everyday usage, sets, collections and lists are 1-indexed (the first item, the second item, the third item, and so on). Python is otherwise such an elegant and natural language. Why the ugly exception of making the user conform to the underlying mechanism of an array being an address plus an offset? All this is really neither here nor there, since this debate, at least as far as Python is concerned, was probably settled 10 years ago and I'm sure nobody wants to hear anything more about it at this point. As you point out, I can define my own array type with inheritance. I will also need my own range command and several other functions which haven't occured to me yet. I was hoping that there would be a standard module to implement this. By the way, what is leave-last-one-out slicing? Is it a[:-1] or is it a[0,...] or is it something else? Eric

In my view, the most important reason to prefer 1-based indexing versus 0-based indexing is compatibility. For numerical work, some of the languages which I use or have used are Matlab, Mathematica, Maple and Fortran. These are all 1-indexed. (C is by nature 0-indexed because it is so close to machine architecture, but with a little bit of not-entirely-clean pointer manipulation, you can easily make 1-indexed arrays and matrices.) Obviously Python can't be compatible with these languages in a strict sense, but like most people who do some programming work, I've built up a library of my own commonly used routines specific to my work; in general it's a trivial matter to translate numerical routines from one language to the another if translation is just a matter of substituting of one set of syntactical symbols and function names for anther. However it can be damn tricky to convert 1-indexed code to 0-indexed code or visa versa without introducing any errors- believe me! (Yes it's possible to call nearly any language from nearly any other language these days so in theory you don't have to recode, but there are lots of reasons why often recoding is the preferable route.)
You aren't the first to raise this issue. I wouldn't mind if the user had the option, but then I again I tend to prefer the flag-for-every-feature approach which others who have more computing experience than me have said leads to problems due to the presence of many different ways to do things and unforseen interaction.s I could definitely see the coding advantage in dealing with implementing algorithms that uses notation that is already 1-based. I have come across this myself -- in fact just yesterday when I was coding up the Pade approximation to the matrix exponential using the pseudo-code algorithm given by Golub and Van Loan in their "Matrix Computations" book. It seems to me like it would be a lot of work to add this feature back into the code now (there would be a million places to look for places where the code inherently assumes 0-based indexing). It would also, as you mention, be inconsistent with Python. A general approach would be to inherit from the UserArray for your codes and reimplement the __getitem__ and __getslice__ commands. Your objects should still be able to be passed to many of the routines which expect arrays (because under the covers one of the first things the array_from_object C-code does is check to see if the object has an __array__ method and calls it). Note that this will not copy data around so there is minimal overhead. But, you would have to take care to wrap the returned object back into an array_object. (Maybe something could be done here...Hmmm.)
By the way, what is leave-last-one-out slicing? Is it a[:-1] or is it a[0,...] or is it something else?
I meant the fact that a[3:6] returns elements a[3], a[4], a[5] but NOT a[6]. I'm sorry for using my own poorly-worded term. I can't remember what other Pythonistas call it.

Eric brought up the point of one-offset arrays which can be relatively easily created with UserArrays. This led me to some thinking about why UserArrays are not more often used. I think one of the biggest reasons is that most of the functions can take UserArrays but returned the basic array type upon completion. So, you end up having to continually construct your UserArrays. Are there other reasons people have thought of? So, here's a suggestion: Why don't we modify PyArray_Return to return an object of the same class as one of the arguments which was passed if the class defines an __array__ method. Which argument to pick and how this would be implemented without changing old code is an interesting question. Assuming it were possible to cause PyArray_Return to do such a thing, would this be a bad idea? Sincrely, Tried-to-use-Matrix-objects-but-always-resort-to-dot(x,x) Travis Oliphant

This led me to some thinking about why UserArrays are not more often used.
I think one of the biggest reasons is that most of the functions can take UserArrays but returned the basic array type upon completion. So, you end up having to continually construct your UserArrays.
Exactly.
Are there other reasons people have thought of?
Performance, in some cases. Or passing objects to C routines that expect plain arrays. But the function issue is certainly the major one.
So, here's a suggestion:
Why don't we modify PyArray_Return to return an object of the same class as one of the arguments which was passed if the class defines an __array__ method.
I don't think it can be done at the level of PyArray_Return, it cannot know what the classes of the arguments were. We *could* achieve the same goal by making appropriate modifications to all the C functions in Numeric.
Which argument to pick and how this would be implemented without changing old code is an interesting question.
No idea about the first issue, but compatibility can be assured (well, in most cases) by picking some attribute name (__array__) that is not used in the current UserArray class. We'd then have some other class derived from UserArray which sets that attribute. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

versus 0-based indexing is compatibility. For numerical work, some of the languages which I use or have used are Matlab, Mathematica, Maple and Fortran. These are all 1-indexed. (C is by nature 0-indexed because it is so close to machine architecture, but with a little bit
I don't think this is a matter if numerics vs. machine orientation, it is just an historical accident. Having used 0-based and 1-based languages extensively, I find both perfectly suitable for numerical work, and in my experience 0-based indexing leads to shorter index expressions in many cases.
symbols and function names for anther. However it can be damn tricky to convert 1-indexed code to 0-indexed code or visa versa without introducing any errors- believe me! (Yes it's possible to call nearly
Having done so recently, I agree :-(
The other reason for choosing 1-based indexing is to keep the code as near to the standard notation as possible. This is one of the
I wouldn't call 1-based indexing "standard" - it all depends on your background.
question: as a simple example a spatial vector (x,y,z) is conventionally labelled 1,2,3 (1-indexed), but a relativistic four-vector with time included (t,x,y,z) is conventionally labelled 0,1,2,3 (0-indexed). So ideally one would be able to choose the
So, when you implement classes that represent vectors, you should stick to whatever notation is common in the application domain. I see bare-bones arrays as we have them in NumPy as a low-level building block for high-level classes, i.e. not as a data type one should use directly in non-trivial application code. It's not just indexing, also operations. For your four-vectors, you would want a method that calculates the norm in the Einstein metric, for example. If you don't have a special four-vector class but use arrays directly, any length calculation will look messy. That's what OO design is good for. In fact, with special four-vector classes and appropriate methods, there will hardly be any explicit indices in the application code at all!
indexing-type case-by-case. I'm sure that computer programmers will argue vehemently that code which mixes both 0-indexed and 1-indexed arrays is hard to understand and maintain, but for experts in a
Yes, if they are supposed to be the same data type. If different classes use different indexing schemes, everything remains clear, as long as the interactions between the classes are well defined.
error-prone. In my case, I'm dealing at the moment with crystal structures with which are associated certain conventional sets of vectors and tensors - all 1-indexed by convention. I find it a
Have a look at the vector and tensor classes in Scientific Python (http://dirac.cnrs-orleans.fr/programs/scientific.html). Although they are 0-based as well (because I got so used to that!), you might find that the predefined methods cover most of your needs and that explicit indices are hardly needed. But you are also free to modify the __getitem__ and __setitem__ methods to make the classes 1-based.
I guess if I understood the reason for 0-indexed lists and tuples in Python I would be happier. In normal, everyday usage, sets,
Python has a very consistent indexing scheme, which you can best understand in the following way: imagine that the indices are pointers to positions *between* the elements: elements a b c d index 0 1 2 3 4 negative index -4 -3 -2 -1 Then all the slicing and indexing rules make a lot of sense, and fulfill practically useful relations. For example, it is true that x[a:b] + x[b:c] equals x[a:c] for any list/array x, which would not be the case with 1-based indices. Adding indices is also more straightforward with base 0, all those "i+j-1" you see so frequently in Fortran code becomes just "i+j". Sure, everyday language works differently, but then you don't write algorithms in everyday language, for a good reason.
By the way, what is leave-last-one-out slicing? Is it a[:-1]
That one, as is clear from my little picture above. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

It's mostly been said already, but I can't help but add my $.02 sometimes... Eric Nodwell wrote:
Without one-offset indexing, it seems to me that Python is minimally useful for numerical computations. Many, perhaps the majority, of numerical algorithms are one-indexed. Matlab for example is one-based for this reason. In fact it seems strange to me that a "high-level" language like Python should use zero-offset lists.
I was a heavy user of MATLAB for a long time before I discovered NumPy, and I have to say that I like the 0-indexing scheme MUCH better!
In my view, the most important reason to prefer 1-based indexing versus 0-based indexing is compatibility. For numerical work, some of the languages which I use or have used are Matlab, Mathematica, Maple and Fortran. These are all 1-indexed.
Actually Fortran is indexed however you decide you want it: DIMENSION array(0:9) DIMENSION array(1:10) or DIMENSION array(10) DIMENSION array(1900:1999) Are all legal. This is a VERY handy feature, and I would say that I used the 0-indexed version most often. The reason is related to C's pointer arithmetic logic: Often the array would represent discrete points on a continuous scale, so I could find the value of X, for instance, by doing: Xaxis(i) = MinX * i * DeltaX with i-indexing, you have to subtract 1 all the time. I suspect that the higher level nature of NumPy would make it a lot harder to have arbitrary indexing of this fashion: if all you have to do is access elements, it is easy, but if you have a whole collection of array oriented operations, as NumPy does, you would probably have to stick with one standard, and I think the 0-indexing standard is the best.
for experts in a particular field who are accustomed to certain ingrained notations, it is the code which breaks the conventional notation which is most error-prone.
This is why being able to set your own indexing notation is the best option, but a difficult one to impliment.
Python is otherwise such an elegant and natural language. Why the ugly exception of making the user conform to the underlying mechanism of an array being an address plus an offset?
I gave an example above, and others have too: Python's indexing scheme is elegant and natural for MANY usages. As with many things Python (indentation, anyone!), I found it awkward to make the transition at first, but then found that it, in fact, made things easier in general. For me, this is the very essence of truly usable design: it is designed to make people most productive in the long run, not most comfortable when they first start using it.
All this is really neither here nor there, since this debate, at least as far as Python is concerned, was probably settled 10 years ago and
Well, yes, and having NumPy different from the rest of Python would NOT be a good idea either.
I'm sure nobody wants to hear anything more about it at this point. As you point out, I can define my own array type with inheritance. I will also need my own range command and several other functions which haven't occured to me yet. I was hoping that there would be a standard module to implement this.
If it were truly generally useful, there probably would be such a package. I imagine most people have found it easier to make the transition than to write a whole package that would allow you not to make the transition. If you really have a lot of code that is 1-indexed that you want to translate, it may be worth the effort for you, and I'm sure there are other folks that would find it useful, but remember that it will always be incompatable with the rest of Python, which may make it harder to use than you imagine. -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------
participants (4)
-
Chris Barker
-
Eric Nodwell
-
Konrad Hinsen
-
Travis Oliphant