From verveer at embl-heidelberg.de  Fri Apr  1 00:40:06 2005
From: verveer at embl-heidelberg.de (Peter Verveer)
Date: Fri Apr  1 00:40:06 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core
In-Reply-To: <424C8D05.7030006@ee.byu.edu>
References: <424C8D05.7030006@ee.byu.edu>
Message-ID: <d868d4c76e8280f0161a21f779718794@embl-heidelberg.de>

Good idea, for many applications such an extension would be 'good 
enough'.

1) python code using such arrays should be 100% compatible with 
numarray/numeric/scipy. Should be possible if a sub-set of 
numeric/numarray/scipy is used.

2) Extensions written in C should handle such arrays transparently 
(without unnecessary copying). Should also be possible given a 
compatible data layout.

Peter

> To all interested in the future of arrays...
>
> I'm still very committed to Numeric3 as I want to bring the numarray 
> and Numeric people together behind a single array object for 
> scientific computing.
>
> But,  I've been thinking about the array protocol and thinking that it 
> would be a good thing if this became universal.  One of the ways to 
> make it universal is by having something that follows it in the Python 
> core.
>
>
> So, what if we proposed for the Python core not something like 
> Numeric3 (which would still exist in scipy.base and be everybody's 
> favorite array :-) ),  but a very minimal array object (scaled back 
> even from Numeric) that followed the array protocol and had some C-API 
> associated with it.
>
>
> This minimal array object would support 5 basic types ('bool', 
> 'integer', 'float', 'complex', 'Object').   (Maybe a void type could 
> be defined and a void "scalar" introduced (which would be the bytes 
> object)).  These types correspond to scalars already available in 
> Python and so the whole 0-dim array Python scalar arguments could be 
> ignored.
>
> Math could be done without ufuncs initially (people really needing 
> speed would use scipy.base anyway).   But, more people in the Python 
> community would be able to use arrays and get used to them.  And we 
> would have a reference array_protocol object so that extension writers 
> could write to it.
>
>
> I would not try a project like this until after scipy_core is out, but 
> it's an interesting thing to think about.  I mainly wanted feedback on 
> the basic concept.
>
>
> An alternative would be to "add" multidimensionality to the array 
> object already part of Python, fix it's reallocating with an exposed 
> buffer problem, and add the array protocol.


From oliphant at ee.byu.edu  Fri Apr  1 01:30:38 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  1 01:30:38 2005
Subject: [Numpy-discussion] __array_typestr__
Message-ID: <424D14E9.70607@ee.byu.edu>

For the most part, it seems the array protocol is easy to agree on.  The 
one difficulty is typestr.

For what it's worth, here are my opinions on what has been said 
regarding the typestr.

* Endian-ness should be included in the typestr --- it is how the data 
is viewed and an intrinsic part of the type as much as int, or float.

* I like the fact that struct character codes are documented, but it is 
hard to remember.  The simpler division into basic types and byte-widths 
that the numarray record module uses is easier to remember. 

* I'm mixed on whether or not support for describing complex data types 
should be used or if their description as a record is good enough.  On 
the one hand we think of complex numbers as additional types, but on the 
other hand, in terms of machine layout they really are just two floats, 
so perhaps it is better to look at them that way in a protocol whose 
purpose is just describing how to interpret a block of memory.   
Especially since complex numbers could conceivably be built on top of 
any of the other types.   In addition, it is conceivable that a rational 
array might be supported by some array object in the future and that 
would most easily be handled by a record array where the names were now 
something like ("numer", "denom") .   The typestr argument should just 
help us specify what is in the memory chunk at each array element (how 
should it be described). 

* I'm wondering about including multiple types in the typestr.  On the 
one hand we could describe complicated structures by packing all the 
information into the  typestr.  On the other hand, it may be better if 
we just use 'V8' to describe an 8-byte memory buffer with  an additional 
attribute that contains both the names and the typestr:

__array_recinfo__ = (('real','f4'),('imag','f4'))

or  for a "rational type"

__array_recinfo__ = (('numer','i4'),('denom','i4'))

so that the detail of the typecode for a "record" type is handled by 
another special method using tuples.    On this level, we could add the 
possibility of specifying a shape for a small array inside (just like 
the record array of numarray does).

-Travis


From faltet at carabos.com  Fri Apr  1 02:01:11 2005
From: faltet at carabos.com (Francesc Altet)
Date: Fri Apr  1 02:01:11 2005
Subject: [Numpy-discussion] Re: Array Metadata
In-Reply-To: <20050401041204.18335.qmail@web50208.mail.yahoo.com>
References: <20050401041204.18335.qmail@web50208.mail.yahoo.com>
Message-ID: <200504011146.44549.faltet@carabos.com>

I'm very much with the opinions of Scott. Just some remarks.

A Divendres 01 Abril 2005 06:12, Scott Gilbert va escriure:
> > __array_names__ (optional comma-separated names for record fields)
>
> I really like this idea.  Although I agree with David M. Cooke that it
> should be a tuple of names.  Unless there is a use case I'm not
> considering, it would be preferrable if the names were restricted to valid
> Python identifiers.

Ok. I was thinking on easing the life of C extension writers, but I
agree that a tuple of names should be relatively easily dealed in C as
well. However, as the __array_typestr__ would be a plain string, then
an __array_names__ being a plain string would be consistent with that.

Also, it would be worth to know how to express a record of different
shaped fields. I mean, how to represent a record like:

[array(Int32,shape=(2,3)), array(Float64,shape=(3,))]

The possibilities are:

__array_shapes__ = ((2,3),(3,))
__array_typestr__ = (i,d)

Other possibility could be an extension of the current struct format:

__array_typestr__ = "(2,3)i(3,)d"

more on that later on.

> The struct module has a portable set of typecodes.  They call it
> "standard", but it's the same thing.  The struct module let's you specify
> either standard or native.  For instance, the typecode for "standard long"
> ("=l") is always 4 bytes while a "native long" ("@l") is likely to be 4 or
> 8 bytes depending on the platform.  The __array_typestr__ codes should
> require the "standard" sizes.  There is a table at the bottom of the
> documentation that goes into detail:
>
>     http://docs.python.org/lib/module-struct.html

I fully agree with Scott here. Struct typecodes are offering a way to
approach the Python standards, and this is a good thing for many
developers that knows nothing of array packages and its different
typecodes. IMO, the set of portable set of typecodes in struct module
should only be abandoned if they cannot fulfil all the requirements of
Numeric3/numarray. But I'm pretty confident that they will eventually
do.

> The only problem with the struct module is that it's missing a few types...
> (long double, PyObject, unicode, bit).

Well, bit is not used either in Numeric/numarray and I think few
people would complain on this (they can always pack bits into bytes).
PyObject and unicode can be reduced to a sequence of bytes and some
other metadata to the array protocol can be added to complement its
meaning (say __array_str_encoding__ = "UTF-8" or similar). 

long double is the only type that should be added to struct typecodes,
but convincing the Python crew to do that should be not difficult, I
guess.

> > I also think that rather than attach < or > to the start of the
> > string it would be easier to have another protocol for endianness.
> > Perhaps something like:
> >
> > __array_endian__  (optional Python integer with the value 1 in it).
> > If it is not 1, then a byteswap must be necessary.
>
> A limitation of this approach is that it can't adequately represent
> struct/record arrays where some fields are big endian and others are little
> endian.

Having a mix of different endianess data values in the same data
record would be a bit ill-minded. In fact, numarray does not support
this: a recarray should be all little or big endian. I think that '<'
and '>' would be more than enough to represent this.

> > Bool               -- "b%d" % sizeof(bool)
> > Signed Integer     -- "i%d" % sizeof(<some int>)
> > Unsigned Integer   -- "u%d" % sizeof(<some uint>)
> > Float              -- "f%d" % sizeof(<some float>)
> > Complex            -- "c%d" % sizeof(<some complex>)
> > Object             -- "O%d" % sizeof(PyObject *)
> >          --- this would only be useful on shared memory
> > String             -- "S%d" % itemsize
> > Unicode            -- "U%d" % itemsize
> > Void               -- "V%d" % itemsize
>
> The above is a nice start at reinventing the struct module typecodes.  If
> you and Perry agree to it, that would be great.  A few additions though:

Again, I think it would be better to not get away from the struct
typecodes. But if you end doing it, well, I would like to propose a
couple of additions to the new protocol:

1.- Support shapes for record specification. I'm listing two
possibilities:

  A) __array_typestr__ = "(2,3)i(3,)d"
  
  This would be an easy extension of the struct string type definition.

  B) __array_typestr__ = ("i4","f8")
     __array_shapes__ = ((2,3),(3,))

  This is more '? la numarray'.
  
2.- Allow nested datatypes. Although numarray does not support this
yet, I think it could be very advantageous to be able to express:

[array(Int32,shape=(5,)),[array(Int16,shape=(2,)),array(Float32,shape=(3,4))]]

i.e., the first field would be an array of ints with 6 elements, while
the second field would be actually another record made of 2 fields:
one array of short ints, and other array of simple precision floats.

I'm not sure how exactly implement this, but, what about:

  A) __array_typestr__ = "(5,)i[(2,)h(3,4)f]"
  
  B) __array_typestr__ = ("i4",("i2","f8"))
     __array_shapes__ = ((5,),((2,),(3,4))
  
Because I'm suggesting to adhere the struct specification, I prefer
option A), although I guess option B would be easier to use for
developers (even for extension developers).


> > So, what if we proposed for the Python core not something like
> > Numeric3 (which would still exist in scipy.base and be everybody's
> > favorite array :-) ), but a very minimal array object (scaled back
> > even from Numeric) that followed the array protocol and had some
> > C-API associated with it.
> >
> > This minimal array object would support 5 basic types ('bool',
> > 'integer', 'float', 'complex', 'Object').   (Maybe a void type
> > could be defined and a void "scalar" introduced (which would be
> > the bytes object)).  These types correspond to scalars already
> > available in Python and so the whole 0-dim array Python scalar
> > arguments could be ignored.
>
> I really like this idea.  It could easily be implemented in C or Python
> script.  Since half it's purpose is for documentation, the Python script
> implementation might make more sense.

Yeah, I fully agree with this also.

Cheers,

-- 
>qo<   Francesc Altet ? ? http://www.carabos.com/
V ?V   C?rabos Coop. V. ??Enjoy Data
 ""


From faltet at carabos.com  Fri Apr  1 02:17:36 2005
From: faltet at carabos.com (Francesc Altet)
Date: Fri Apr  1 02:17:36 2005
Subject: [Numpy-discussion] __array_typestr__
In-Reply-To: <424D14E9.70607@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu>
Message-ID: <200504011215.52914.faltet@carabos.com>

A Divendres 01 Abril 2005 11:31, Travis Oliphant va escriure:
> * I'm wondering about including multiple types in the typestr.  On the
> one hand we could describe complicated structures by packing all the
> information into the  typestr.  On the other hand, it may be better if
> we just use 'V8' to describe an 8-byte memory buffer with  an additional
> attribute that contains both the names and the typestr:
>
> __array_recinfo__ = (('real','f4'),('imag','f4'))
>
> or  for a "rational type"
>
> __array_recinfo__ = (('numer','i4'),('denom','i4'))
>
> so that the detail of the typecode for a "record" type is handled by
> another special method using tuples.    On this level, we could add the
> possibility of specifying a shape for a small array inside (just like
> the record array of numarray does).

Like:

__array_recinfo__ = (('numer','i4', (3,4)),('denom','i4', (2,))) ?

Also, this can be easily extended to nested types:

__array_recinfo__ = (('a','i4',(3,4)),(('b','i4',(2,)),('c','f4',(10,2)))

Well, this looks pretty good to me. It has nothing to do with struct
format, but is much more usable, of course.

Cheers,

-- 
>qo<   Francesc Altet ? ? http://www.carabos.com/
V ?V   C?rabos Coop. V. ??Enjoy Data
 ""


From cjw at sympatico.ca  Fri Apr  1 04:57:57 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Fri Apr  1 04:57:57 2005
Subject: [Numpy-discussion] Re: Bytes Object and Metadata
In-Reply-To: <qnkd5tg26be.fsf@arbutus.physics.mcmaster.ca>
References: <20050328182929.50411.qmail@web50205.mail.yahoo.com>	<42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> <qnkd5tg26be.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <424D4504.4030606@sympatico.ca>

David M. Cooke wrote:

>Francesc Altet <faltet at carabos.com> writes:
>
>  
>
>>A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure:
>>    
>>
>>>My proposal:
>>>
>>>__array_data__  (optional object that exposes the PyBuffer protocol or a
>>>sequence object, if not present, the object itself is used).
>>>__array_shape__ (required tuple of int/longs that gives the shape of the
>>>array)
>>>__array_strides__ (optional provides how to step through the memory in
>>>bytes (or bits if a bit-array), default is C-contiguous)
>>>__array_typestr__ (optional struct-like string showing the type ---
>>>optional endianness indicater + Numeric3 typechars, default is 'V')
>>>__array_itemsize__ (required if above is 'S', 'U', or 'V')
>>>__array_offset__ (optional offset to start of buffer, defaults to 0)
>>>      
>>>
>>Considering that heterogenous data is to be suported as well, and
>>there is some tradition of assigning names to the different fields, I
>>wonder if it would not be good to add something like:
>>
>>__array_names__ (optional comma-separated names for record fields)
>>    
>>
>
>A sequence (list or tuple) of strings would be preferable. That
>removes all worrying about using commas in the names.
>
>  
>
As I understand it, record arrays can be heterogenous.  If so, wouldn't 
it make sense for this to be a sequence of tuples?

For example:  [('Name', charStringType), ('Age', _nt.Int8), ...]
 Where _nt is defined by something like:
import numarray.numerictypes as _nt

Colin W.


From cjw at sympatico.ca  Fri Apr  1 05:49:53 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Fri Apr  1 05:49:53 2005
Subject: [Numpy-discussion] __array_typestr__
In-Reply-To: <424D14E9.70607@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu>
Message-ID: <424D5136.8060703@sympatico.ca>

Travis Oliphant wrote:

>
> For the most part, it seems the array protocol is easy to agree on.  
> The one difficulty is typestr.
>
> For what it's worth, here are my opinions on what has been said 
> regarding the typestr.
>
> * Endian-ness should be included in the typestr --- it is how the data 
> is viewed and an intrinsic part of the type as much as int, or float.

In most cases, endian-ness is associated with the machine being used, 
rather than the data element.  It seems to me that numarray's numeric 
types provides a good model, which may need enhancing for records, 
strings etc.

numarray has:

      Numeric type objects:
        Bool
        Int8 Int16 Int32 Int64
        UInt8 UInt16 UInt32 UInt64
        Float32 Double64
        Complex32 Complex64

      Numeric type classes:
        NumericType
          BooleanType
          SignedType
          UnsignedType
          IntegralType
            SignedIntegralType
            UnsignedIntegralType
          FloatingType
          ComplexType


>
> * I like the fact that struct character codes are documented, but it 
> is hard to remember.  

This is the problem.  numerictypes provides nmenonic names and, if one 
uses an editor with autocompletion, a prompt from the editor.  For those 
interface to existing code, there could be a helper function:

              def toType(eltType= 'i'):  => an instance of NumericType

It should also be possible to derive the typeCode from the eltType, 
numarray doesn't seem to provide this.

Colin W.


From cjw at sympatico.ca  Fri Apr  1 06:07:38 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Fri Apr  1 06:07:38 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <424C8D05.7030006@ee.byu.edu>
References: <424C8D05.7030006@ee.byu.edu>
Message-ID: <424D5557.5010806@sympatico.ca>

Travis Oliphant wrote:

>
> To all interested in the future of arrays...
>
> I'm still very committed to Numeric3 as I want to bring the numarray 
> and Numeric people together behind a single array object for 
> scientific computing.
>
Good.

> But,  I've been thinking about the array protocol and thinking that it 
> would be a good thing if this became universal.  One of the ways to 
> make it universal is by having something that follows it in the Python 
> core.
>
>
> So, what if we proposed for the Python core not something like 
> Numeric3 (which would still exist in scipy.base and be everybody's 
> favorite array :-) ),  but a very minimal array object (scaled back 
> even from Numeric) that followed the array protocol and had some C-API 
> associated with it.
>
I thought that your original Numeric3 proposal was in this direction - a 
simple multidimensional array class/type which could
eventually replace Python's array module.  In addition, and separately, 
there were to be a collection of ufuncs.

Later, discussion seemed to drift from the basic Numeric3 towards SciPy.

>
> This minimal array object would support 5 basic types ('bool', 
> 'integer', 'float', 'complex', 'Object').   (Maybe a void type could 
> be defined and a void "scalar" introduced (which would be the bytes 
> object)).  These types correspond to scalars already available in 
> Python and so the whole 0-dim array Python scalar arguments could be 
> ignored.  

Could this be subclassed so that provision could be made for Int8 (or 
even Int1)?

How would an array of records be handled?

>
> Math could be done without ufuncs initially (people really needing 
> speed would use scipy.base anyway).   But, more people in the Python 
> community would be able to use arrays and get used to them.  And we 
> would have a reference array_protocol object so that extension writers 
> could write to it.

It would be good if the user could write his/her ufunc in Python.

>
>
> I would not try a project like this until after scipy_core is out, but 
> it's an interesting thing to think about.  I mainly wanted feedback on 
> the basic concept.
>
The concept looks good.  Regarding timing, it seems better to build the 
foundation before building the house.

Colin W.

>
> An alternative would be to "add" multidimensionality to the array 
> object already part of Python, fix it's reallocating with an exposed 
> buffer problem, and add the array protocol.
>
>
>
> -Travis


From oliphant at ee.byu.edu  Fri Apr  1 12:10:00 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  1 12:10:00 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <371840ef050401104875650ddd@mail.gmail.com>
References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com>
Message-ID: <424DAA16.10007@ee.byu.edu>

>>I'm still very committed to Numeric3 as I want to bring the numarray and
>>Numeric people together behind a single array object for scientific
>>computing.
>>    
>>
Notice that regardless of what I said about what goes into standard 
Python, something like Numeric3 will always exist for use by scientific 
users.  It may just be a useful add on package like Numeric has always 
been.  There is no way I'm going to abandon use of a more capable Numeric. 

>Right. I believe that, among all libraries related with numeric array,
>eventually only one library in the Python core will survive no matter
>how much advanced functions are available, because of the strong
>compatibility with other packages.
>  
>
I don't think this is true.   Things will survive based on utility.  
What we are trying to do with the Python core is define a standard 
protocol that is flexible enough to handle anybody's concept of an 
advanced array (in particular the advanced array that will be in 
scipy.base). 

>Totally agree. I doubt that Guido will accept a large and complex
>library into the standard Python core. I think Numeric is already too
>complex, and numarray is far more complex to be a standard lib in the
>Python core. Numeric3 must shift its focus from better Numeric to
>scale-downed Numeric.
>  
>
I disagree about "shifting focus."  Personally, I'm not going to work on 
something like that until we have a single array package that fulfills 
the needs of all Numeric and most numarray users.   I'm just pointing 
out that what goes in to the Python core should probably be a scaled 
down object with a souped-up "protocol"  so that the array object in 
scipy.base can be used through the array protocol by any other package 
without worrying about having scipy_core at compile time.

>For example, how many Python users care about masked arrays? How many
>Python users want the advanced type from the Python core? I think the
>advanced array type should in some extension lib, not in core array
>lib. 
>
Perhaps you do see my point of view.   Not all Python users care about 
an advanced array object but nearly all technical (scientific and 
engineering users) will.   We just need interoperability.

>If we make clear our target ? becoming a standard library in the
>Python core, we may have no problem in determining what functions
>should be in the core array lib and what functions should be in
>extension libraries using the core array type.
>  
>
>Today, the array type in the Python core is almost useless.
>If Numeric3 offers just much faster performance on numeric types, many
>Python users will start to use new array type in their applications.
>Once it happens, we can create a bunch of extension libraries for more
>advanced operations on the new array type.
>  
>
The "bunch of extension libraries" is already happening and is already 
in progress.  I think we've overshot the mark for the Python core, 
however.   No need to wait "til something happens"

>With all my heart I hope that Numeric3 gears to this direction before
>  
>
>we get the tragedy to have Numeric4, Numeric5, and so on.
>  
>

I'm coming to see that what is most important for the Python core is 
"protocols".  Then, there can be a "million" different array types that 
can all share each other's memory without hassle and much overhead.  

I'm still personally interested in a better Numeric, however, and so 
won't be abandoning the concept of Numeric3 (notice I now call it 
scipy.base --- not a change of focus just a change of name).    I just 
wanted to encourage some discussion on the array protocol.

-Travis


From oliphant at ee.byu.edu  Fri Apr  1 12:23:19 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  1 12:23:19 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <424D5557.5010806@sympatico.ca>
References: <424C8D05.7030006@ee.byu.edu> <424D5557.5010806@sympatico.ca>
Message-ID: <424DAD00.1050203@ee.byu.edu>

> I thought that your original Numeric3 proposal was in this direction - 
> a simple multidimensional array class/type which could
> eventually replace Python's array module.  In addition, and 
> separately, there were to be a collection of ufuncs.


No, that's a misunderstanding.   Original Numeric3 was never about 
"simplyifying."  Because, we can't "simplify" and still support the uses 
that Numeric and numarray have enjoyed.  I'm more interested in using 
something like Numeric and will always install it should it exist.   I 
was iunterested in getting it into the Python core for standardization.  
I now believe that "universal" standardization should occur around a 
"protocol" and perhaps a simple implementation. 

I'm still interested in a more "local standardization" for numarray and 
Numeric users (not all Python users) which is the focus of scipy.base 
(used to call it Numeric3).

In the process we are generating good ideas that can be used for "global 
standardization" among all Python users.   But,  I can't do it all.  I 
have to keep focused on what I'm doing with the current Numeric 
arrayobject (and that has never been about "getting rid of 
functionality"). 

>
> Later, discussion seemed to drift from the basic Numeric3 towards SciPy.

The context of the problem as I see it intimately involves scipy and the 
collection of packages surrounding numarray.  The small community we 
have built up was diverging in the creation of external packages.  This 
is what troubled me most deeply.  So, there is no Numeric3 separate from 
the larger issue of "a collection of standard scientific packages" that 
scipy has tried to be.  That is why reference to scipy is made.   I see 
no "drifting occurring" 

There is a separate issue of a good array module for Python.  I now see 
the solution there as being more of a "good array protocol" for Python 
with a default very simple implementation that is improved by extension 
modules.

>
> Could this be subclassed so that provision could be made for Int8 (or 
> even Int1)?

I suppose, but this is kind of missing the point, because Numeric3 will 
support those types.  If you need a more advanced array you install 
scipy.base.

>
> How would an array of records be handled?

By installing a more advanced array.

> The concept looks good.  Regarding timing, it seems better to build 
> the foundation before building the house.

The problem with your analogy is that the "sprawling mansion in the 
suburbs is already built" (Numeric has been around for a long time).    
The question is what kind of housing to build for the city dwellers and 
what kind of transportation system do we establish so people can move 
back and forth easily. 

-Travis


From sdhyok at gmail.com  Fri Apr  1 12:59:07 2005
From: sdhyok at gmail.com (Daehyok Shin)
Date: Fri Apr  1 12:59:07 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core
In-Reply-To: <424DAA16.10007@ee.byu.edu>
References: <424C8D05.7030006@ee.byu.edu>
	 <371840ef050401104875650ddd@mail.gmail.com>
	 <424DAA16.10007@ee.byu.edu>
Message-ID: <371840ef05040112574b6a86bd@mail.gmail.com>

On Apr 1, 2005 8:07 PM, Travis Oliphant <oliphant at ee.byu.edu> wrote:

snip

> I disagree about "shifting focus."  Personally, I'm not going to work on
> something like that until we have a single array package that fulfills
> the needs of all Numeric and most numarray users.   I'm just pointing
> out that what goes in to the Python core should probably be a scaled
> down object with a souped-up "protocol"  so that the array object in
> scipy.base can be used through the array protocol by any other package
> without worrying about having scipy_core at compile time.

Would you tell me what exactly you means by "protocol"?
Do you mean a standard defintion of a series of "interfaces" for array
type in Python?

-- 
Daehyok Shin
Geography Department
University of North Carolina-Chapel Hill
USA


From oliphant at ee.byu.edu  Fri Apr  1 15:14:07 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  1 15:14:07 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <371840ef05040112574b6a86bd@mail.gmail.com>
References: <424C8D05.7030006@ee.byu.edu>	 <371840ef050401104875650ddd@mail.gmail.com>	 <424DAA16.10007@ee.byu.edu> <371840ef05040112574b6a86bd@mail.gmail.com>
Message-ID: <424DD56E.6070801@ee.byu.edu>

Daehyok Shin wrote:

>On Apr 1, 2005 8:07 PM, Travis Oliphant <oliphant at ee.byu.edu> wrote:
>
>snip
>
>  
>
>>I disagree about "shifting focus."  Personally, I'm not going to work on
>>something like that until we have a single array package that fulfills
>>the needs of all Numeric and most numarray users.   I'm just pointing
>>out that what goes in to the Python core should probably be a scaled
>>down object with a souped-up "protocol"  so that the array object in
>>scipy.base can be used through the array protocol by any other package
>>without worrying about having scipy_core at compile time.
>>    
>>
>
>Would you tell me what exactly you means by "protocol"?
>Do you mean a standard defintion of a series of "interfaces" for array
>type in Python?
>  
>
Yes, pretty much.   I would even go so far as to say a set of hooks in 
the typeobject (like the sequence, mapping, and buffer protocols). 

-Travis


From steve at shrogers.com  Sat Apr  2 06:50:58 2005
From: steve at shrogers.com (Steven H. Rogers)
Date: Sat Apr  2 06:50:58 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <424DAA16.10007@ee.byu.edu>
References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> <424DAA16.10007@ee.byu.edu>
Message-ID: <424EB08F.90909@shrogers.com>

First, thanks for doing this Travis.

Travis Oliphant wrote:
> 
> I'm coming to see that what is most important for the Python core is 
> "protocols".  Then, there can be a "million" different array types that 
> can all share each other's memory without hassle and much overhead. 
> I'm still personally interested in a better Numeric, however, and so 
> won't be abandoning the concept of Numeric3 (notice I now call it 
> scipy.base --- not a change of focus just a change of name).    I just 
> wanted to encourage some discussion on the array protocol.
> 

Your array protocol protocol idea sounds good.  It should not only make it 
easier to interoperate with other Python packages, but foreign entities like 
APL/J, Matlab, and LabVIEW.

Regards,
Steve
-- 
Steven H. Rogers, Ph.D., steve at shrogers.com
Weblog: http://shrogers.com/weblog
"Reach low orbit and you're half way to anywhere in the Solar System."
-- Robert A. Heinlein


From oliphant at ee.byu.edu  Sat Apr  2 21:30:03 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sat Apr  2 21:30:03 2005
Subject: [Numpy-discussion] scipy.base (Numeric3) now has math
Message-ID: <424F7F06.4090200@ee.byu.edu>

I've updated scipy.base (Numeric3) so math is now supported (uses the 
old ufunc apparatus with new added types support). 

There is still some work to be done so this is still very alpha (but at 
least math operations work):

   -  update the ufunc apparatus to use buffers to avoid copying an 
entire array just for type casting (and to support unaligned and non 
byteswapped arrays)
   -  update the way error handling is done.
   -  update the coercion strategy like numarray does

   -  fix all the bugs.


I've also fixed things so Numeric extension modules should compile --- 
Please report warnings and bugs with this as well.

Thanks for all your help,

-Travis


From oliphant at ee.byu.edu  Sun Apr  3 01:06:16 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 01:06:16 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <200504011215.52914.faltet@carabos.com>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com>
Message-ID: <424FB19B.4060800@ee.byu.edu>

Hello all,

I've updated the numeric web site and given special prominence to the 
array interface which I believe should be pushed.  Numeric 24.0 will 
support it as will scipy.base (Numeric3).  I hope that numarray will 
also support it in an upcoming release.

Please read through the interface and feel free to comment.  However, 
unless there is a glaring problem, I'm more interested that you feel 
free to start using the interface then that we debate it further.

Scott has expressed interest in implementing a very basic Python-only 
implementation of an object exporting the interface.  I suggest he and 
anyone else interested look at numarray for a starting point for a 
Python implementation, and Numeric for a C implementation. 


-Travis


From mdehoon at ims.u-tokyo.ac.jp  Sun Apr  3 01:24:07 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sun Apr  3 01:24:07 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <424FB19B.4060800@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu>
Message-ID: <424FB72F.4020201@ims.u-tokyo.ac.jp>

There are two questions that I have about the array interface:

1) To what degree will the new array interface look different to users of the 
existing Numerical Python? If I were to install the new array interface on the 
computer of a current Numerical Python user and I didn't tell them, would they 
notice a difference?
2) To what degree is the new array interface compatible with Numerical Python 
for the purpose of C extension modules? Do C extension modules need to be 
modified in order to use the new array interface?

--Michiel.

Travis Oliphant wrote:

> 
> Hello all,
> 
> I've updated the numeric web site and given special prominence to the 
> array interface which I believe should be pushed.  Numeric 24.0 will 
> support it as will scipy.base (Numeric3).  I hope that numarray will 
> also support it in an upcoming release.
> 
> Please read through the interface and feel free to comment.  However, 
> unless there is a glaring problem, I'm more interested that you feel 
> free to start using the interface then that we debate it further.
> 
> Scott has expressed interest in implementing a very basic Python-only 
> implementation of an object exporting the interface.  I suggest he and 
> anyone else interested look at numarray for a starting point for a 
> Python implementation, and Numeric for a C implementation.
> 
> -Travis
> 
> 
> 
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> 
> 

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From oliphant at ee.byu.edu  Sun Apr  3 01:41:09 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 01:41:09 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <424FB72F.4020201@ims.u-tokyo.ac.jp>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp>
Message-ID: <424FB9FA.1090109@ee.byu.edu>

Michiel Jan Laurens de Hoon wrote:

> There are two questions that I have about the array interface:
>
> 1) To what degree will the new array interface look different to users 
> of the existing Numerical Python? If I were to install the new array 
> interface on the computer of a current Numerical Python user and I 
> didn't tell them, would they notice a difference?

Nothing will look different.  For now there is nothing to "install" so 
the array interface is just something to expect from other objects.    
The only thing that would be different is in Numeric 24.0 (if a users 
were to call array(<someobj>) and <someobj> supported the array 
interface then Numeric could return an array (without copying data).  

Older versions of Numeric won't benefit from the interface but won't be 
harmed either.

> 2) To what degree is the new array interface compatible with Numerical 
> Python for the purpose of C extension modules? Do C extension modules 
> need to be modified in order to use the new array interface?

It is completely compatible.  C-extensions don't need to be modified at 
all to make use of the interface (of course they should be re-compiled 
if using Numeric 24.0).   Only two things will be modified in Numeric 
24.0.  1) PyArray_FromObject and friends will be expanded so that if an 
object exposes the array interface the right thing will be done to use 
it's memory.   2) Attributes will be added so that Numeric arrays expose 
the array interface so other objects can use their memory intelligently

-Travis


From cjw at sympatico.ca  Sun Apr  3 05:23:12 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Sun Apr  3 05:23:12 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
Message-ID: <424FE002.6010800@sympatico.ca>

C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install
running install
running build
running config
error: The .NET Framework SDK needs to be installed before building 
extensions for Python.

Is there any chance that a Windows binary could be made available for 
testing?

Colin W.


From mdehoon at ims.u-tokyo.ac.jp  Sun Apr  3 05:35:05 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sun Apr  3 05:35:05 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <424FE002.6010800@sympatico.ca>
References: <424FE002.6010800@sympatico.ca>
Message-ID: <424FE3D8.7040200@ims.u-tokyo.ac.jp>

You can use Cygwin's MinGW compiler by adding --compiler=mingw after the setup 
command.

--Michiel.

Colin J. Williams wrote:

> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install
> running install
> running build
> running config
> error: The .NET Framework SDK needs to be installed before building 
> extensions for Python.
> 
> Is there any chance that a Windows binary could be made available for 
> testing?
> 
> Colin W.
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> 
> 

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From mdehoon at ims.u-tokyo.ac.jp  Sun Apr  3 05:46:04 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sun Apr  3 05:46:04 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <424FE3D8.7040200@ims.u-tokyo.ac.jp>
References: <424FE002.6010800@sympatico.ca> <424FE3D8.7040200@ims.u-tokyo.ac.jp>
Message-ID: <424FE64F.7030706@ims.u-tokyo.ac.jp>

Sorry, that should be --compiler=mingw32.

Michiel Jan Laurens de Hoon wrote:

> You can use Cygwin's MinGW compiler by adding --compiler=mingw after the 
> setup command.
> 
> --Michiel.
> 
> Colin J. Williams wrote:
> 
>> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install
>> running install
>> running build
>> running config
>> error: The .NET Framework SDK needs to be installed before building 
>> extensions for Python.
>>
>> Is there any chance that a Windows binary could be made available for 
>> testing?
>>
>> Colin W.
>>
>>
>> -------------------------------------------------------
>> SF email is sponsored by - The IT Product Guide
>> Read honest & candid reviews on hundreds of IT Products from real users.
>> Discover which products truly live up to the hype. Start reading now.
>> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>>
>>
> 

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From gruben at bigpond.net.au  Sun Apr  3 06:32:09 2005
From: gruben at bigpond.net.au (Gary Ruben)
Date: Sun Apr  3 06:32:09 2005
Subject: [Numpy-discussion] array slicing question
Message-ID: <424FF03A.4060107@bigpond.net.au>

This may be relevant to Numeric 3, but is possibly just a general 
question about array slicing which will either reveal a deficiency in 
specifying slices or in my knowledge of slicing with numpy.
A while ago I was trying to reimplement some Matlab image processing 
code in Numeric and revealed a deficiency in the way slices are defined. 
Suppose I have an n x m array and want to slice off the first and last p 
rows and columns where p can range from 0 to some number. Matlab 
provides a clean way of doing this, but in numpy it's a bit of a mess.

You might think you could do
 >>> p=1
 >>> b = a[p:-p]

but if p=0, this fails.
My final solution involved getting the array shape and explicitly 
calculating start and stop columns, but is there a better way?

Gary R.


From oliphant at ee.byu.edu  Sun Apr  3 08:36:35 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 08:36:35 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu>
Message-ID: <42500D03.3030809@ee.byu.edu>

I don't know if you have followed the array interface discussion.   It 
is defined at http://numeric.scipy.org

I have implemented consumer and exporter interfaces for Numeric and an 
exporter interface for numarray.  The consumer interface needs a little 
help but shouldn't take too long for someone who understands numarray 
better.

Now Numeric arrays can share data with numarray (no data copy).   
scipy.base arrays will also implement the array interface.

I think the array interface is a good direction to go.

-Travis


From konrad.hinsen at laposte.net  Sun Apr  3 13:03:19 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Sun Apr  3 13:03:19 2005
Subject: [Numpy-discussion] array slicing question
In-Reply-To: <424FF03A.4060107@bigpond.net.au>
References: <424FF03A.4060107@bigpond.net.au>
Message-ID: <9d9c98344e25f20ac8509e76f3917ec6@laposte.net>

On 03.04.2005, at 15:31, Gary Ruben wrote:

> You might think you could do
> >>> p=1
> >>> b = a[p:-p]
>
> but if p=0, this fails.

b = a[p:len(a)-p] works even for p=0.

Konrad.
--
------------------------------------------------------------------------ 
-------
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
------------------------------------------------------------------------ 
-------


From oliphant at ee.byu.edu  Sun Apr  3 21:21:15 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 21:21:15 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <20050403165914.GC10730@idi.ntnu.no>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no>
Message-ID: <4250C0A4.9070707@ee.byu.edu>

Magnus Lie Hetland wrote:

>Travis Oliphant <oliphant at ee.byu.edu>:
>  
>
>>I don't know if you have followed the array interface discussion.   It 
>>is defined at http://numeric.scipy.org
>>    
>>
>
>This very, very good! The numeric future of Python is looking very
>bright, IMO :)
>
>Some tiny points:
>
>  - Shouldn't the regexp for __array_typestr__ be
>    '[<>]?[tbiufcOSUV][0-9]+'?
>  
>
Probably.   Since, I guess you can only have one of < or > .  Thanks..

>  - What are the semantics when __array_typestr__ isn't V[0-9]+ and
>    __array_descr__ is set? Is __array_typestr__ ignored? Or... What
>    would it be used for?
>  
>
I would say that the __array_descr__ always gives more information but 
not every array implementation will support looking at it.  For example, 
current Numeric (24.0 in CVS) ignores __array_descr__ and just looks at 
the typestr (and doesn't support 'V').  So,  I suspect that another 
array package that knows this may choose something else besides 'V' if 
it really wants Numeric to still understand it. 

Suppose you have a complex short int array with

__array_descr__ = 'V8

>  - Does the description of __array_data__ mean that the discussed
>    bytes type is no longer needed? (If we can use buffers, that
>    sounds very good to me.)
>  
>
Bytes is still needed because the buffer object is not very good and we 
need a good buffer object in Python for lots of other reasons.  It would 
be very useful, for example to be able to allocate memory using the 
Python bytes object.  But, it does mean less pressure to get it to work.

>  - Why the parentheses around "buffer protocol-satisfying object" in
>    the description of __array_mask__? And why must it be 'b1'? What
>    if I happen to have mask data from a non-array-protocol source,
>    which happens to be, say, b8 (not unreasonable, I think)? Wouldn't
>    it be good to allow any size of these, and just use zero/non-zero
>    as the criterion? Some of the point of this protocol is to avoid
>    copying and using the original data, after all...? (Same goes for
>    the requirement that it be C-contiguous. I guess I'm basically
>    saying that perhaps __array_mask__ should be an array itself. Or,
>    at least, that it could be *allowed* to be...)
>  
>
I added the mask late last night.  It is probably the least thought out 
portion.  Everything else has been through the ringer a couple more 
times.    My whole thinking is that I just didn't want to explode the 
protocol with another special name for the mask type.  But, saying that 
the mask object itself can  support the array interface doesn't do that, 
so I think that is a good call.

Last night, using the numarray exporter interface and the Numeric 
consumer interface I was able to share data between a Numeric array and 
numarray array with no copying of the data buffers.  It was very nice.


-Travis


From oliphant at ee.byu.edu  Sun Apr  3 21:29:12 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 21:29:12 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <4250C0A4.9070707@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu>
Message-ID: <4250C276.5090300@ee.byu.edu>

>>
> Probably.   Since, I guess you can only have one of < or > .  Thanks..
>
>>  - What are the semantics when __array_typestr__ isn't V[0-9]+ and
>>    __array_descr__ is set? Is __array_typestr__ ignored? Or... What
>>    would it be used for?
>>  
>>
> I would say that the __array_descr__ always gives more information but 
> not every array implementation will support looking at it.  For 
> example, current Numeric (24.0 in CVS) ignores __array_descr__ and 
> just looks at the typestr (and doesn't support 'V').  So,  I suspect 
> that another array package that knows this may choose something else 
> besides 'V' if it really wants Numeric to still understand it.
> Suppose you have a complex short int array with
>
> __array_descr__ = 'V8


Let me finish this example:

Suppose you have a complex short int array with

__array_descr__ = [('real','i2'),('imag','i2')]

you could describe this as

__array_typestr__ = 'V4'  

or think of it as a 4 byte integer if you want to make sure that another 
array package that may not support void pointers can still manipulate 
the data, and so the creator of the complex short int array may decide that

__array_typestr__ = 'i4'

is the right thing to do for packages that ignore the __array_descr__  
attribute.

-Travis


From mdehoon at ims.u-tokyo.ac.jp  Mon Apr  4 01:17:15 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Mon Apr  4 01:17:15 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <424FB9FA.1090109@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu>
Message-ID: <4250F8E5.9020701@ims.u-tokyo.ac.jp>

Travis Oliphant wrote:
>> 1) To what degree will the new array interface look different to users 
>> of the existing Numerical Python?
> 
> Nothing will look different.  For now there is nothing to "install" so 
> the array interface is just something to expect from other objects.    
> The only thing that would be different is in Numeric 24.0 (if a users 
> were to call array(<someobj>) and <someobj> supported the array 
> interface then Numeric could return an array (without copying data). 
> Older versions of Numeric won't benefit from the interface but won't be 
> harmed either.

Very nice. Thanks, Travis.
I'm not sure what you mean by "the array interface could become part of the 
Python standard as early as Python 2.5", since there is nothing to install. Or 
does this mean that Python's array will conform to the array interface?

Some comments on the array interface:

1) The "__array_shape__" method is identical to the existing "shape" method in 
Numerical Python and numarray (except that "shape" does a little bit better 
checking, but it can be added easily to "__array_shape__"). To avoid code 
duplication, it might be better to keep that method. (and rename the other 
methods for consistency, if desired).

2) The __array_datalen__ is introduced to get around the 32-bit int limitation 
of len(). Another option is to fix len() in Python itself, so that it can return 
  integers larger than 32 bits. So we can avoid adding a new method.

3) Where do default values come from? Is it the responsability of the extension 
module writer to find out if the array module implements e.g. __array_strides__, 
and substitute the default values if it doesn't? If so, I have a slight 
preference to make all methods required, since it's not a big effort to return 
the defaults, and there will be more extension modules than array packages (or 
so I hope).

Whereas the array interface certainly helps extension writers to create an 
extension module that works with all array implementations, it also enables and 
perhaps encourages the creation of different array modules, while our original 
goal was to create a single array module that satisfies the needs of both 
Numerical Python and numarray users. I still think such a solution would be 
preferable. Inconsistencies other than the array interface (e.g. one implements 
argmax(x) while another implements x.argmax()) may mean that an extension module 
can work with one array implementation but not with another, even though they 
both conform to the array interface. We may end up with several array packages 
(we already have Numerical Python, numarray, and scipy), and extension modules 
that work with one package and not with another. So in a sense, the array 
interface is letting the genie out of the bottle.

But maybe such a single array package is not attainable given the different 
needs of the different communities.

--Michiel.


-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From magnus at hetland.org  Mon Apr  4 02:05:28 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Mon Apr  4 02:05:28 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <4250C0A4.9070707@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu>
Message-ID: <20050404090356.GB21527@idi.ntnu.no>

Travis Oliphant <oliphant at ee.byu.edu>:
>
[snip]
> Last night, using the numarray exporter interface and the Numeric 
> consumer interface I was able to share data between a Numeric array and 
> numarray array with no copying of the data buffers.  It was very nice.

Wow -- a historic moment :)

Now, if we can only get the stdlib's array module to support this
protocol (and sprout some more dimensions), as you mentioned... That
would really be cool.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Mon Apr  4 02:15:10 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Mon Apr  4 02:15:10 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <4250C276.5090300@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu> <4250C276.5090300@ee.byu.edu>
Message-ID: <20050404091311.GC21527@idi.ntnu.no>

Travis Oliphant <oliphant at ee.byu.edu>:
>
[snip]
> 
> Let me finish this example:
> 
> Suppose you have a complex short int array with
> 
> __array_descr__ = [('real','i2'),('imag','i2')]
> 
> you could describe this as
> 
> __array_typestr__ = 'V4'  

Sure -- I can see how using 'V' makes sense... You're just telling the
host program how many bytes you've got, and that's it. That makes
sense to me. What I wondered about was what happened when you use a
more specific (and conflicting) type for the typestr...

> or think of it as a 4 byte integer if you want to make sure that another 
> array package that may not support void pointers can still manipulate 
> the data, and so the creator of the complex short int array may decide that
> 
> __array_typestr__ = 'i4'

This is basically what I'm wondering about. It would make sense (to
me) to say that the data type was 'V4', because that's simply less
specific, in a way. But saying 'i4' is just as specific as the complex
example, above -- but it means something else! You're basically giving
the program permission to interpret a four-byte complex number as a
four-byte integer, aren't you? Sounds almost like a recipe for
disaster to me :}

On the other hand -- there is no complex integer type in the
interface, and using 'c4' probably would be completely wrong as well.

I would almost be tempted to say that if __array_descr__ is in use,
__array_typestr__ *has* to use the 'V' type. (Or, one could make some
more complicated rules, perhaps, in order to allow other types.)

As for not supporting the 'V' type -- would that really be considered
a conforming implementation? According to the spec, "Objects wishing
to support an N-dimensional array in application code should look for
these attributes and use the information provided appropriately". The
typestr is required, so...

Perhaps the spec should be explicit about the shoulds/musts/mays of
the specific typecodes? What must be supported, what may be supported
etc.? Or perhaps that doesn't make sense? It just seems almost too bad
that one package would have to know what another package supports in
order to formulate its own typestr... It sort of throws part of the
interoperability out the window.

> is the right thing to do for packages that ignore the __array_descr__  
> attribute.
> 
> -Travis

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Mon Apr  4 02:25:17 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Mon Apr  4 02:25:17 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4250F8E5.9020701@ims.u-tokyo.ac.jp>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp>
Message-ID: <20050404092421.GD21527@idi.ntnu.no>

Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp>:
>
[snip]
> 1) The "__array_shape__" method is identical to the existing "shape" method 
> in Numerical Python and numarray (except that "shape" does a little bit 
> better checking, but it can be added easily to "__array_shape__"). To avoid 
> code duplication, it might be better to keep that method. (and rename the 
> other methods for consistency, if desired).

Why not just use 'shape' as an alias for '__array_shape__' (or vice
versa)?

> 2) The __array_datalen__ is introduced to get around the 32-bit int
> limitation of len(). Another option is to fix len() in Python
> itself, so that it can return integers larger than 32 bits. So we
> can avoid adding a new method.

That would bee good, IMO. But how realistic is it? (I have no idea --
this is not a rhetorical question :)

> 3) Where do default values come from? Is it the responsability of the 
> extension module writer to find out if the array module implements e.g. 
> __array_strides__, and substitute the default values if it doesn't?

If the support of these attributes is optional, that would have to be
the case.

> If so, I have a slight preference to make all methods required,
> since it's not a big effort to return the defaults, and there will
> be more extension modules than array packages (or so I hope).

But isn't the point that you should be able to export other things
(such as images or sounds or what-have-you) *as* arrays?

As for implementing the defaults: How about having some utility
functions (or a wrapper object or whatever) that does just this -- so
neither array nor client code need think about it? This could,
perhaps, be put in the stdlib array module or something...

> Whereas the array interface certainly helps extension writers to
> create an extension module that works with all array
> implementations, it also enables and perhaps encourages the creation
> of different array modules, while our original goal was to create a
> single array module that satisfies the needs of both Numerical
> Python and numarray users. I still think such a solution would be
> preferable.

I agree.

But what I think would be cool if such a standardized package could
take any object conforming to this protocol and use it (possibly as
the argument to the array() constructor) -- with all the ufuncs and
operators it has. Because then I could implement specialized arrays
where the specialized behaviour lies just in the data itself, not the
behaviour. For example, I might want to create a thin array wrapper
around a memory-mapped, compressed video file, and treat it as a
three-dimensional array of rgb triples... (And so forth.)

> Inconsistencies other than the array interface (e.g.  one implements
> argmax(x) while another implements x.argmax()) may mean that an
> extension module can work with one array implementation but not with
> another,

This does *not* sound like a good thing -- I agree. Certainly not what
I would hope this protocol is used for.

> even though they both conform to the array interface. We may end up
> with several array packages (we already have Numerical Python,
> numarray, and scipy), and extension modules that work with one
> package and not with another. So in a sense, the array interface is
> letting the genie out of the bottle.

Well, perhaps -- but the current APIs of e.g., Numeric or numarray
could be used in the same way (i.e., writing your own array
implementations with the same interface).

As (I think) Travis has said, there is still a goal (somewhat separate
from the protocol) of getting one standard heavy-duty numerical array
package. I think that would be very beneficial. The point (as I see
it) is just to make it easier for various array implementations (i.e.,
the data, not the ufuncs/operators etc.) to interoperate with it.

> But maybe such a single array package is not attainable given the
> different needs of the different communities.

I would certainly hope it is.

> --Michiel.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From gruben at bigpond.net.au  Mon Apr  4 05:14:09 2005
From: gruben at bigpond.net.au (Gary Ruben)
Date: Mon Apr  4 05:14:09 2005
Subject: [Numpy-discussion] array slicing question
In-Reply-To: <9d9c98344e25f20ac8509e76f3917ec6@laposte.net>
References: <424FF03A.4060107@bigpond.net.au> <9d9c98344e25f20ac8509e76f3917ec6@laposte.net>
Message-ID: <42512F57.2050007@bigpond.net.au>

Thanks Konrad,
Sorry, my example was too simple. The actual example representing an 
image should have been 2-D and not necessarily square. Therefore I used 
shape instead of len and it seemed messy doing it this way.
Gary

konrad.hinsen at laposte.net wrote:
> On 03.04.2005, at 15:31, Gary Ruben wrote:
> 
>> You might think you could do
>> >>> p=1
>> >>> b = a[p:-p]
>>
>> but if p=0, this fails.
> 
> 
> b = a[p:len(a)-p] works even for p=0.
> 
> Konrad.
> -- 
> ------------------------------------------------------------------------ 
> -------
> Konrad Hinsen
> Laboratoire Leon Brillouin, CEA Saclay,
> 91191 Gif-sur-Yvette Cedex, France
> Tel.: +33-1 69 08 79 25
> Fax: +33-1 69 08 82 61
> E-Mail: khinsen at cea.fr
> ------------------------------------------------------------------------ 
> -------
> 
> 


From oliphant at ee.byu.edu  Mon Apr  4 12:16:09 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr  4 12:16:09 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4250F8E5.9020701@ims.u-tokyo.ac.jp>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp>
Message-ID: <4251920B.6060708@ee.byu.edu>

Michiel Jan Laurens de Hoon wrote:

> Travis Oliphant wrote:
>
>>> 1) To what degree will the new array interface look different to 
>>> users of the existing Numerical Python?
>>
>>
>> Nothing will look different.  For now there is nothing to "install" 
>> so the array interface is just something to expect from other 
>> objects.    The only thing that would be different is in Numeric 24.0 
>> (if a users were to call array(<someobj>) and <someobj> supported the 
>> array interface then Numeric could return an array (without copying 
>> data). Older versions of Numeric won't benefit from the interface but 
>> won't be harmed either.
>
>
> Very nice. Thanks, Travis.
> I'm not sure what you mean by "the array interface could become part 
> of the Python standard as early as Python 2.5", since there is nothing 
> to install. Or does this mean that Python's array will conform to the 
> array interface?


The latter is what I mean...  I think it is important to have something 
in Python itself that "conforms to the interface."   I wonder if it 
would also be nice to have some protocol slots in the object type so 
that extension writers can avoid converting some objects.     There is 
also the possibility that a very simple N-d array type could be included 
in Python 2.5 that conforms to the interface, if somebody wants to 
champion that.


I think it is important to realize what the array interface is trying to 
accomplish.  From my perspective, I still think it is better for the 
scientific community to build off of a single array object that is "best 
of breed."  The purpose of the array interface is to allow us scientific 
users to share information with other Python extension writers who may 
be wary to require scipy.base for their users but who really should be 
able to interoperate with scipy.base arrays.    I'm thinking of 
extensions like wxPython, PIL, and so forth.  


There are also lots of uses for arrays that don't necessarily need the 
complexity of the scipy.base array (or uses that need even more 
types).   At some point we may be able to accomodate dynamic type 
additions to the scipy.base array.  But, right now it requires enough 
work that others may want to design their own simple arrays.  It's very 
useful if all such arrays could speak together with a common basic 
language.


The fact that numarray and Numeric arrays can talk to each other more 
seamlessly was not the main goal of the array interface but it is a nice 
side benefit.   I'd still like to see the scientific community use a 
single array.  But, others may not see it that way.  The array interface 
lets us share more easily.


>
> Some comments on the array interface:
>
> 1) The "__array_shape__" method is identical to the existing "shape" 
> method in Numerical Python and numarray (except that "shape" does a 
> little bit better checking, but it can be added easily to 
> "__array_shape__"). To avoid code duplication, it might be better to 
> keep that method. (and rename the other methods for consistency, if 
> desired).


There is no code duplication.  In these cases it is just another name 
for .shape.    What "better
checking" are you referring to?

>
> 2) The __array_datalen__ is introduced to get around the 32-bit int 
> limitation of len(). Another option is to fix len() in Python itself, 
> so that it can return  integers larger than 32 bits. So we can avoid 
> adding a new method.


Python len() will never return a 64-bit number on a 32-bit platform.  

>
> 3) Where do default values come from? Is it the responsability of the 
> extension module writer to find out if the array module implements 
> e.g. __array_strides__, and substitute the default values if it 
> doesn't? If so, I have a slight preference to make all methods 
> required, since it's not a big effort to return the defaults, and 
> there will be more extension modules than array packages (or so I hope).


Optional attributes let modules that care talk to each other on a 
"higher level" without creating noise for simpler extensions.   Both the 
consumer and exporter have to use it to matter.  The defaults are just 
clarifying what is being assumed if it isn't there. 


>
> Whereas the array interface certainly helps extension writers to 
> create an extension module that works with all array implementations, 
> it also enables and perhaps encourages the creation of different array 
> modules, while our original goal was to create a single array module 
> that satisfies the needs of both Numerical Python and numarray users. 
> I still think such a solution would be preferable. 


I agree with you.   I would like a single array module for scientific 
users.  But, satisfying everybody is probably impossible with a single 
array object.    Yes, there could be a proliferation of array objects 
but sometimes we need multiple array objects to learn from each other.   
It's nice to have actual code that implements some idea rather than just 
words in a mailing list. 


The interface  allows us to talk to each other while we learn from each 
other's actual working implementations. 


In a way this is like the old argument between the 1920-era communists 
and the free-marketers.  The communists say that we should have only one 
company that produces some product because having multiple companies is 
"wasteful" of resources,  while the free-marketers point out that 
satisfying consumers is tricky business, and there is not only "one 
right way to do it."  Therefore,  having multiple companies each trying 
to satisfy consumers actually creates wealth as new and better ideas are 
tried by the different companies.  The successful ideas are emulated by 
the rest.   In mature markets there tend to be a reduction in the number 
of producers while in developing markets there are all kinds of 
companies producing basically the same thing. 


Of course software creates it's own issues that aren't addressed by that 
simple analogy, but I think it's been shown repeatedly that good 
interfaces (http, smtp anyone?) create a lot of utility.

> Inconsistencies other than the array interface (e.g. one implements 
> argmax(x) while another implements x.argmax()) may mean that an 
> extension module can work with one array implementation but not with 
> another, even though they both conform to the array interface. We may 
> end up with several array packages (we already have Numerical Python, 
> numarray, and scipy), and extension modules that work with one package 
> and not with another. So in a sense, the array interface is letting 
> the genie out of the bottle.


I think this genie is out of the bottle already.  We need to try and get 
our wishes from it now.

-Travis


From xscottg at yahoo.com  Mon Apr  4 19:09:30 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Mon Apr  4 19:09:30 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: 6667
Message-ID: <20050404233322.61350.qmail@web50208.mail.yahoo.com>

--- Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp> wrote:
>
> I'm not sure what you mean by "the array interface could become
> part of the Python standard as early as Python 2.5", since there
> is nothing to install. Or does this mean that Python's array will
> conform to the array interface?
>

It would be nice to have the Python array module support the protocol for
the 1-Dimensional arrays that it implements.  It would also be nice to add
a *simple* ndarray object in the core that supports multi-dimensional
arrays.  I think breaking backward compatibility of the existing Python
array module to support multiple dimensions would be a mistake and unlikely
to get accepted.

A PEP would likely be required to make the changes to the array module, and
also to add an ndarray module would likely document the interface.  In that
regard, it could "make it into the core" for Python 2.5.

But you're right that external packages could support this interface today.
 There is nothing to install...

> 
> 1) The "__array_shape__" method is identical to the existing "shape"
> method in Numerical Python and numarray (except that "shape" does a
> little bit better checking, but it can be added easily 
> to "__array_shape__"). To avoid code duplication, it might be better
> to keep that method. (and rename the other methods for consistency,
> if desired).
>

The intent is that all array packages would have the required/optional
protocol attributes.  Of course at a higher level, this information will
probably be presented to the users, but they might choose a different
mechanism.

So while A.__array_shape__ always returns a tuple of longs, A.shape is free
to return a ShapeObject or be an assignable attribute that changes the
shape of the object.  With the property mechanism, there is no need to
store duplicated data (__array_shape__ can be a property method that
returns a dynamically generated tuple).

Separating the low level description of the array data in memory from the
high level interface that particular packages like scipy.base or numarray
present to their users is a good thing.


> 
> 3) Where do default values come from? Is it the responsability of the
> extension module writer to find out if the array module implements e.g.
> __array_strides__, and substitute the default values if it doesn't? If
> so, I have a slight preference to make all methods required, since it's
> not a big effort to return the defaults, and there will be more extension
> modules than array packages (or so I hope).
> 

If we can get a *simple* package into the core, in addition to implementing
an ndarray object, this module could have helper functions that do this
sort of thing.  For instance:

    def get_strides(A):
        if hasattr(A, "__array_strides__"):
            return A.__array_strides__
        shape = A.__array_shape__
        size = get_itemsize(A)
        for i in range(len(shape)-1, -1, -1):
            strides.append(size)
            size *= shape[i]
        return tuple(strides)

    def get_itemsize(A):
        typestr = A.__array_typestr__
        # skip the endian
        if typestr[0] in '<>': typestr = typestr[1:]
        # skip the char code
        typestr = typestr[1:]
        return long(typestr)

    def is_contiguous(A):
        # etc....

Those are probably buggy and need work, but you get the idea...  A C
implementation of the above would be easy to do and useful, and it could be
done inline in a single include file (no linking headaches).


Cheers,
    -Scott


From xscottg at yahoo.com  Mon Apr  4 19:09:34 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Mon Apr  4 19:09:34 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: 6667
Message-ID: <20050404233447.26327.qmail@web50204.mail.yahoo.com>

--- Magnus Lie Hetland <magnus at hetland.org> wrote:
> 
> I would almost be tempted to say that if __array_descr__ is in use,
> __array_typestr__ *has* to use the 'V' type. (Or, one could make some
> more complicated rules, perhaps, in order to allow other types.)
> 

Yup, having multiple ways to spell the same information will likely cause
problems.  Wouldn't be bad for the protocol to say "thou shalt use the
specfic typestr when possible".  Or to say that the __array_descr__ is only
for 'V' typestrs.


>
> As for not supporting the 'V' type -- would that really be considered
> a conforming implementation? According to the spec, "Objects wishing
> to support an N-dimensional array in application code should look for
> these attributes and use the information provided appropriately". The
> typestr is required, so...
>

I think the intent is that libraries like wxPython or PIL can recognize
data that they *want* to work with.  They can raise an exception when
passed anything that is more complicated than they're willing to deal with.

I think many packages will simply punt when they see a 'V' typestr and not
look at the more complicated description at all.  Nothing wrong with
that...  The packages that produce more complicated data structures have a
way to express it and pass it to the packages that are capable of consuming
it.  Easy things are easy, and hard things are possible.


> 
> Perhaps the spec should be explicit about the shoulds/musts/mays of
> the specific typecodes? What must be supported, what may be supported
> etc.? Or perhaps that doesn't make sense? It just seems almost too bad
> that one package would have to know what another package supports in
> order to formulate its own typestr... It sort of throws part of the
> interoperability out the window.
>

Being very precise in the language describing the protocol is probably a
good thing, but I don't see anything that requires packages to formulate
their typestr's differently.  The little bit of ambiguity that is in the
__array_typestr__ and __array_descr__ attributes can be easily clarified.


Cheers,
    -Scott


From xscottg at yahoo.com  Mon Apr  4 19:09:38 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Mon Apr  4 19:09:38 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050404092421.GD21527@idi.ntnu.no>
Message-ID: <20050404233620.70070.qmail@web50209.mail.yahoo.com>

--- Magnus Lie Hetland <magnus at hetland.org> wrote:
>
> Why not just use 'shape' as an alias for '__array_shape__' (or vice
> versa)?
> 

The protocol just describes the layout and format of the data in memory. 
As such, most users won't use it directly just as most users don't call
obj.__add__ directly...

If an array implementation has a .shape attribute, it can be whatever the
implementor wants.  Perhaps it's assignable.  Maybe it's a method that
returns a ShapeObject with methods and attributes of it's own.  Features
like these are the things that make the high level array packages like
Numeric and Numarray enjoyable to use.  The low level __array_*metadata__
interface should be simple and precisely defined and just for data
interchange.


> 
> > 3) Where do default values come from? Is it the responsability of the 
> > extension module writer to find out if the array module implements e.g.
> > __array_strides__, and substitute the default values if it doesn't?
> 
> If the support of these attributes is optional, that would have to be
> the case.
> 

> 
> As for implementing the defaults: How about having some utility
> functions (or a wrapper object or whatever) that does just this -- so
> neither array nor client code need think about it? This could,
> perhaps, be put in the stdlib array module or something...
> 

There will be a simple Python module or C include file for such things. 
Hopefully it will eventually be included in the Python standard
distribution, but even if that doesn't happen, it will be easier than
requiring and linking against the Numeric/Numarray/scipy.base libraries
directly.

> 
> But what I think would be cool if such a standardized package could
> take any object conforming to this protocol and use it (possibly as
> the argument to the array() constructor) -- with all the ufuncs and
> operators it has. Because then I could implement specialized arrays
> where the specialized behaviour lies just in the data itself, not the
> behaviour. For example, I might want to create a thin array wrapper
> around a memory-mapped, compressed video file, and treat it as a
> three-dimensional array of rgb triples... (And so forth.)
> 

If you want the ufuncs, you probably want one of the full featured library
packages like scipy.base or numarray.  It looks like Travis is able to
promote any "array protocol object" to a full blown scipy.base.array
already.


>
> > Inconsistencies other than the array interface (e.g.  one implements
> > argmax(x) while another implements x.argmax()) may mean that an
> > extension module can work with one array implementation but not with
> > another,
> 
> This does *not* sound like a good thing -- I agree. Certainly not what
> I would hope this protocol is used for.
> 

Things like argmax(x) are not part of this protocol.  The high level array
packages and libraries will have all sorts of crazy and useful features.

The protocol only describes the layout and format of the data.  It enables
higher level packages to work seemlessly with all the different array
objects. 

That said, this protocol would allow a version argmax(x) to be written in
such a way as to handle *any* array object.


Cheers,
    -Scott


From mdehoon at ims.u-tokyo.ac.jp  Mon Apr  4 19:13:33 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Mon Apr  4 19:13:33 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050404092421.GD21527@idi.ntnu.no>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no>
Message-ID: <4251F40C.6000402@ims.u-tokyo.ac.jp>

Magnus Lie Hetland wrote:
> Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp>:
>>2) The __array_datalen__ is introduced to get around the 32-bit int
>>limitation of len(). Another option is to fix len() in Python
>>itself, so that it can return integers larger than 32 bits. So we
>>can avoid adding a new method.
> 
> 
> That would bee good, IMO. But how realistic is it? (I have no idea --
> this is not a rhetorical question :)

Actually, why is __array_datalen__ needed at all? Can't it be calculated 
trivially from __array_shape__?

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From mdehoon at ims.u-tokyo.ac.jp  Mon Apr  4 19:56:23 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Mon Apr  4 19:56:23 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4251920B.6060708@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <4251920B.6060708@ee.byu.edu>
Message-ID: <4251F384.7080506@ims.u-tokyo.ac.jp>


Travis Oliphant wrote:
>> Some comments on the array interface:
>>
>> 1) The "__array_shape__" method is identical to the existing "shape" 
>> method in Numerical Python and numarray (except that "shape" does a 
>> little bit better checking, but it can be added easily to 
>> "__array_shape__"). To avoid code duplication, it might be better to 
>> keep that method. (and rename the other methods for consistency, if 
>> desired).
> 
> 
> 
> There is no code duplication.  In these cases it is just another name 
> for .shape.    What "better checking" are you referring to?


The method __array_shape__ is

     if (strcmp(name, "__array_shape__") == 0) {
         PyObject *res;
         int i;
         res = PyTuple_New(self->nd);
         for (i=0; i<self->nd; i++) {
             PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i]));
         }
         return res;
     }

while the method shape is

     if (strcmp(name, "shape") == 0) {
         PyObject *s, *o;
         int i;

         if ((s=PyTuple_New(self->nd)) == NULL) return NULL;

         for(i=self->nd; --i >= 0;) {
             if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL;
             if (PyTuple_SetItem(s,i,o) == -1) return NULL;
         }
         return s;
     }

so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I don't see 
how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be fine.

--Michiel.

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From oliphant at ee.byu.edu  Mon Apr  4 20:37:07 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr  4 20:37:07 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4251F40C.6000402@ims.u-tokyo.ac.jp>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> <4251F40C.6000402@ims.u-tokyo.ac.jp>
Message-ID: <4252078C.3050300@ee.byu.edu>

> Actually, why is __array_datalen__ needed at all? Can't it be 
> calculated trivially from __array_shape__?

Lovely point.    I've taken away the __array_datalen__ from the 
interface description.

-Travis


From cookedm at physics.mcmaster.ca  Mon Apr  4 21:17:19 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Mon Apr  4 21:17:19 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4251F384.7080506@ims.u-tokyo.ac.jp> (Michiel Jan Laurens de
 Hoon's message of "Tue, 05 Apr 2005 11:10:12 +0900")
References: <424D14E9.70607@ee.byu.edu>
	<200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu>
	<424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu>
	<4250F8E5.9020701@ims.u-tokyo.ac.jp> <4251920B.6060708@ee.byu.edu>
	<4251F384.7080506@ims.u-tokyo.ac.jp>
Message-ID: <qnkbr8tdfil.fsf@arbutus.physics.mcmaster.ca>

Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp> writes:

> Travis Oliphant wrote:
>>> Some comments on the array interface:
>>>
>>> 1) The "__array_shape__" method is identical to the existing
>>> "shape" method in Numerical Python and numarray (except that
>>> "shape" does a little bit better checking, but it can be added
>>> easily to "__array_shape__"). To avoid code duplication, it might
>>> be better to keep that method. (and rename the other methods for
>>> consistency, if desired).
>> There is no code duplication.  In these cases it is just another
>> name for .shape.    What "better checking" are you referring to?
>
> The method __array_shape__ is
>
>      if (strcmp(name, "__array_shape__") == 0) {
>          PyObject *res;
>          int i;
>          res = PyTuple_New(self->nd);
>          for (i=0; i<self->nd; i++) {
>              PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i]));
>          }
>          return res;
>      }
>
> while the method shape is
>
>      if (strcmp(name, "shape") == 0) {
>          PyObject *s, *o;
>          int i;
>
>          if ((s=PyTuple_New(self->nd)) == NULL) return NULL;
>
>          for(i=self->nd; --i >= 0;) {
>              if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL;
>              if (PyTuple_SetItem(s,i,o) == -1) return NULL;
>          }
>          return s;
>      }
>
> so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I
> don't see how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be
> fine.

The #1 rule of thumb when using the Python C API: _always_ check your
returned results (this usually means checking for NULL). In this,
PyInt_FromLong _can_ fail (if there's an error creating the int free
list). I've fixed this in CVS.

You're right on PyTuple_SET_ITEM: the space for it is guaranteed to
exist after the PyTuple_New.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From oliphant at ee.byu.edu  Mon Apr  4 22:18:23 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr  4 22:18:23 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <20050403165914.GC10730@idi.ntnu.no>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no>
Message-ID: <42521F76.5080309@ee.byu.edu>

Magnus Lie Hetland wrote:

>  - Does the description of __array_data__ mean that the discussed
>    bytes type is no longer needed? (If we can use buffers, that
>    sounds very good to me.)
>  
>

We can use the buffer object, now and it works as far as it goes.   But, 
there are very important reasons for the creation of a good bytes object.

Probably, THE most important reason for the bytes object is Pickle 
support without always making an intermediate string (and the 
accompanying copy that is involved).   Right now, a string is the only 
way to Pickle array data.  A bytes object would allow a way to Pickle 
without making a copy.

-Travis


From Chris.Barker at noaa.gov  Tue Apr  5 00:32:17 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Tue Apr  5 00:32:17 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <42521F76.5080309@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu>
Message-ID: <42523EC0.5000303@noaa.gov>

Travis Oliphant wrote:
> Right now, a string is the only 
> way to Pickle array data.  A bytes object would allow a way to Pickle 
> without making a copy.

So could the new array protocol allow us to make a Python String from an 
array without copying? That could be pretty handy.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer
                                      		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From magnus at hetland.org  Tue Apr  5 01:49:25 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:49:25 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050404233620.70070.qmail@web50209.mail.yahoo.com>
References: <20050404092421.GD21527@idi.ntnu.no> <20050404233620.70070.qmail@web50209.mail.yahoo.com>
Message-ID: <20050405084839.GD29671@idi.ntnu.no>

Scott Gilbert <xscottg at yahoo.com>:
>
[snip]
> > > Inconsistencies other than the array interface (e.g.  one implements
> > > argmax(x) while another implements x.argmax()) may mean that an
> > > extension module can work with one array implementation but not with
> > > another,
> > 
> > This does *not* sound like a good thing -- I agree. Certainly not what
> > I would hope this protocol is used for.
> > 
> 
> Things like argmax(x) are not part of this protocol.  The high level array
> packages and libraries will have all sorts of crazy and useful features.

Sure -- I realise that. I just mean that I hope there won't be several
scientific array modules that implement similar concepts with
different APIs, just because they can (because of the new array API).

> The protocol only describes the layout and format of the data.  It enables
> higher level packages to work seemlessly with all the different array
> objects. 

Exactly.

> That said, this protocol would allow a version argmax(x) to be
> written in such a way as to handle *any* array object.

... given that you can compare the values in the array, of course.
But, yes. This would be (IMO) the ideal situation. Instead of spawning
several equivalent-but-different scientific array modules (i.e. the
ones implementing such functionality as argmax()) we would have *one*
main, standard such module, whose operations would work with almost
any conceivable array object (e.g. from wxPython or PIL). That seems
like a very, very good situation, IMO.

> Cheers,
>     -Scott

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 01:51:35 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:51:35 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <42521F76.5080309@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu>
Message-ID: <20050405085041.GE29671@idi.ntnu.no>

Travis Oliphant <oliphant at ee.byu.edu>:
>
> Magnus Lie Hetland wrote:
> 
> > - Does the description of __array_data__ mean that the discussed
> >   bytes type is no longer needed? (If we can use buffers, that
> >   sounds very good to me.)
> > 
> >
> 
> We can use the buffer object, now and it works as far as it goes.   But, 
> there are very important reasons for the creation of a good bytes object.
> 
> Probably, THE most important reason for the bytes object is Pickle 
> support without always making an intermediate string (and the 
> accompanying copy that is involved).   Right now, a string is the only 
> way to Pickle array data.  A bytes object would allow a way to Pickle 
> without making a copy.

Ah. Very good argument, of course.

But, as I understand it, the protocol as it stands could work with
buffers until we get bytes objects?

> -Travis

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 01:52:09 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:52:09 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <42523EC0.5000303@noaa.gov>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu> <42523EC0.5000303@noaa.gov>
Message-ID: <20050405085108.GF29671@idi.ntnu.no>

Chris Barker <Chris.Barker at noaa.gov>:
>
> Travis Oliphant wrote:
> >Right now, a string is the only 
> >way to Pickle array data.  A bytes object would allow a way to Pickle 
> >without making a copy.
> 
> So could the new array protocol allow us to make a Python String from an 
> array without copying? That could be pretty handy.

Or treat a string as an array... Yay! :)

> -Chris

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 01:52:25 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:52:25 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4252078C.3050300@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> <4251F40C.6000402@ims.u-tokyo.ac.jp> <4252078C.3050300@ee.byu.edu>
Message-ID: <20050405085138.GG29671@idi.ntnu.no>

Travis Oliphant <oliphant at ee.byu.edu>:
>
> 
> >Actually, why is __array_datalen__ needed at all? Can't it be 
> >calculated trivially from __array_shape__?
> 
> Lovely point.    I've taken away the __array_datalen__ from the 
> interface description.

This is only getting prettier and prettier :)

> -Travis

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 01:57:12 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:57:12 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <20050404233447.26327.qmail@web50204.mail.yahoo.com>
References: <20050404233447.26327.qmail@web50204.mail.yahoo.com>
Message-ID: <20050405085642.GH29671@idi.ntnu.no>

Scott Gilbert <xscottg at yahoo.com>:
>
[snip]
> I think the intent is that libraries like wxPython or PIL can
> recognize data that they *want* to work with.  They can raise an
> exception when passed anything that is more complicated than they're
> willing to deal with.

Sure. I'm just saying that it would be good to have a baseline -- a
basic, mandatory level of conformance, so that if I expose an array
using only that part of the API (or, with the rest being optional
information) I know that any conforming array consumer will understand
me.

As long as we have this, I have to know the capabilities of my
consumer before I can write an appropriate typestr, for example. E.g.,
one application may only accept b1, while another would only accept i1
etc. Who knows -- there may well be sets of consumer applications that
have mutually exclusive sets of accepted typestrings unless a minimum
is mandated.

That's really what I was after here. In addition to saying that
typestr *must* be supported, one might say something about what
typestrs must be supported.

On the other hand -- perhaps such requirements should only be made on
the array side? What requirements can/should one really make on the
consumer side? I mean -- even though we have a strict sequence
protocol, there is nothing wrong with creating something sequence-like
(e.g., supporting floats as indices) and having consumer functions
that aren't as strict as the official protocol...

I just think it's something that it might be worth being explicit
about.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 02:00:24 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 02:00:24 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050404233322.61350.qmail@web50208.mail.yahoo.com>
References: <20050404233322.61350.qmail@web50208.mail.yahoo.com>
Message-ID: <20050405085905.GI29671@idi.ntnu.no>

Scott Gilbert <xscottg at yahoo.com>:
>
> 
> --- Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp> wrote:
> >
> > I'm not sure what you mean by "the array interface could become
> > part of the Python standard as early as Python 2.5", since there
> > is nothing to install. Or does this mean that Python's array will
> > conform to the array interface?
> >
> 
> It would be nice to have the Python array module support the protocol for
> the 1-Dimensional arrays that it implements.  It would also be nice to add
> a *simple* ndarray object in the core that supports multi-dimensional
> arrays.  I think breaking backward compatibility of the existing Python
> array module to support multiple dimensions would be a mistake and unlikely
> to get accepted.

Do we really have to break backward compatibility in order to add more
dimensions to the array module?

There may be some issues with, e.g., typecode, but still...

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From a.schmolck at gmx.net  Tue Apr  5 05:28:13 2005
From: a.schmolck at gmx.net (Alexander Schmolck)
Date: Tue Apr  5 05:28:13 2005
Subject: [Numpy-discussion] array slicing question
In-Reply-To: <424FF03A.4060107@bigpond.net.au> (Gary Ruben's message of
 "Sun, 03 Apr 2005 23:31:38 +1000")
References: <424FF03A.4060107@bigpond.net.au>
Message-ID: <yfsr7hpmmw9.fsf@black4.ex.ac.uk>

Gary Ruben <gruben at bigpond.net.au> writes:

> This may be relevant to Numeric 3, but is possibly just a general question
> about array slicing which will either reveal a deficiency in specifying slices
> or in my knowledge of slicing with numpy.
> A while ago I was trying to reimplement some Matlab image processing code in
> Numeric and revealed a deficiency in the way slices are defined. Suppose I
> have an n x m array and want to slice off the first and last p rows and
> columns where p can range from 0 to some number. Matlab provides a clean way
> of doing this, but in numpy it's a bit of a mess.
>
> You might think you could do
>  >>> p=1
>  >>> b = a[p:-p]

b = a[p:-p or None]

'as


From werner.bruhin at free.fr  Tue Apr  5 11:26:36 2005
From: werner.bruhin at free.fr (Werner F. Bruhin)
Date: Tue Apr  5 11:26:36 2005
Subject: [Numpy-discussion] AttributeError: _NumErrorMode instance has no attribute 'dividebyzero'
Message-ID: <4252D77F.10600@free.fr>

If I use "Numeric.Error.setMode(all='Raise')" I get the above 
AttributeError.

I found this on 1.1.1 but just downloaded 
"numarray-1.2.3.win32-py2.4.exe" and I still find the same problem.

I use numarray with wx.lib.plot.py to generate some simple charts.

I would like to catch the exceptions and display an appropriate message 
to the user.  Is the above the right approach or am I going about this 
the wrong way round?

Any hints are appreciated.
Werner


From xscottg at yahoo.com  Tue Apr  5 13:35:37 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Tue Apr  5 13:35:37 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: 6667
Message-ID: <20050405203434.38638.qmail@web50204.mail.yahoo.com>

--- Magnus Lie Hetland <magnus at hetland.org> wrote:
> 
> Do we really have to break backward compatibility in order to add more
> dimensions to the array module?
> 

You're right.  The Python array module could change in a backwards
compatible way.  Possibly using keyword arguments to specify parameters
that have never been there before.

We could probably make sense out of array.insert(), array.append(),
array.extend(), array.pop(), and array.reverse() by giving those an "axis"
keyword.  Even array.remove() could be made to work for more dimensions,
but it probably wouldn't get used often.  Maybe some of these would just
raise an exception for ndims > 1.

Then we'd have to add some additional typecodes for complex and a few
others.

Under the hood, it would basically be a complete reimplementation, but
maybe that is the way to go...  It does keep the number of array modules
down.  I wonder which way would meet less resistance in getting accepted in
the core.  I think creating a new ndarray object would be less risk of
breaking existing applications.

>
> There may be some issues with, e.g., typecode, but still...
>

The .typecode attribute could return the same values it always has.  The
.__array_typestr__ attribute would return the new style values.  That's
confusing, but probably unavoidable.  It would be nice if there was only
one set of typecodes for all of Python, but I think we're stuck with many
(array module typecores, struct module typecodes, array protocol
typecodes). 


Cheers,
    -Scott


From oliphant at ee.byu.edu  Tue Apr  5 14:28:39 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Tue Apr  5 14:28:39 2005
Subject: [Numpy-discussion] Questions about ufuncs now.
Message-ID: <4253028D.4090407@ee.byu.edu>

The arrayobject for scipy.base seems to be working.  Currently the 
Numeric3 CVS tree is using the "old-style" ufuncs modified with new code 
for the newly added types.     It should be quite functionable now for 
the brave at heart.

I'm now working on modifying the ufunc object for scipy.base.

These are the changes I'm working on:

   1) a thread-specific? context that allows "buffer-size" level trapping
   of errors and retrieving of flags set.  Similar to the
   decimal.context specification, but it uses the floating point
   sticky bits to implement.

   2) implementation of buffers so that type-conversions (and
   byteswapping and alignment if necessary) never creates temporaries
   larger than the buffer-size (the buffer-size is user settable).

   3) a reworking of the general N-dimensional loop to use array 
iterators with optimizations
   applied for contiguous arrays.

   4) Alteration of coercion rules so that scalars (i.e. rank-0 arrays) 
do not dictate coercion rules
   Also, change so that certain mixed-type operations are computed in 
larger type for both.

Most of this is pretty straightforward.  But, I do have one addiitonal 
question.  Do the new array scalars count as "non-coercing" scalars 
(i.e. like the Python scalars), or do they cause coercion?

My preference is that  ALL scalars (anything that becomes 0-dimensional 
arrays internally) cause only "kind-casting" (i.e. int to float, float 
to complex, etc.) but not "type-casting"


-Travis


From oliphant at ee.byu.edu  Tue Apr  5 16:02:34 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Tue Apr  5 16:02:34 2005
Subject: [Numpy-discussion] Numeric 24.0
Message-ID: <42531880.3060600@ee.byu.edu>

I'd like to release a Numeric 24.0  to get the array interface out 
there.   There are also some other bug fixes in Numeric 24.0

Here is the list so far from Numeric 23.7

[Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a is 
2-d of Int16
[unreported]  Added array interface
[unreported]  Allow Long Integers to be used in slices
[1123145]     Handle mu==0.0 appropiately in ranlib/ignpoi.
[unreported]  Return error info in ranlib instead of printing it to stderr
[1151892]     dot() would quit python with zero-sized arrays when using
              dotblas. The BLAS routines *gemv and *gemm need LDA >= 1.
[unreported]  Fixed empty for Object arrays

Version 23.8  March 2005
[Cooke]       Fixed more 64-bit issues (patch 117603)
[unreported]  Changed arrayfnsmodule back to PyArray_INT where the code
              typecasts to (int *).  Changed CanCastSafely to check
              if sizeof(long) == sizeof(int)


I'll wait a little bit to allow last minute bug fixes to go in, but I'd 
realy like to see this release get out there.  For users of Numeric 
 >23.7 try
Numeric.empty((10,20),'O')  if you want to see an *interesting* bug that 
is fixed in CVS.

-Travis


From cookedm at physics.mcmaster.ca  Tue Apr  5 16:13:31 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Tue Apr  5 16:13:31 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42531880.3060600@ee.byu.edu> (Travis Oliphant's message of
 "Tue, 05 Apr 2005 17:00:16 -0600")
References: <42531880.3060600@ee.byu.edu>
Message-ID: <qnk7jjgddht.fsf@arbutus.physics.mcmaster.ca>

Travis Oliphant <oliphant at ee.byu.edu> writes:

> I'd like to release a Numeric 24.0  to get the array interface out
> there.   There are also some other bug fixes in Numeric 24.0
>
> Here is the list so far from Numeric 23.7
>
> [Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a
> is 2-d of Int16
> [unreported]  Added array interface
> [unreported]  Allow Long Integers to be used in slices
> [1123145]     Handle mu==0.0 appropiately in ranlib/ignpoi.
> [unreported]  Return error info in ranlib instead of printing it to stderr
> [1151892]     dot() would quit python with zero-sized arrays when using
>               dotblas. The BLAS routines *gemv and *gemm need LDA >= 1.
> [unreported]  Fixed empty for Object arrays
>
> Version 23.8  March 2005
> [Cooke]       Fixed more 64-bit issues (patch 117603)
> [unreported]  Changed arrayfnsmodule back to PyArray_INT where the code
>               typecasts to (int *).  Changed CanCastSafely to check
>               if sizeof(long) == sizeof(int)
>
>
> I'll wait a little bit to allow last minute bug fixes to go in, but
> I'd realy like to see this release get out there.  For users of
> Numeric >23.7 try
> Numeric.empty((10,20),'O')  if you want to see an *interesting* bug
> that is fixed in CVS.

Can you hold on? I've got some bugs I'm working on. There's some
64-bit things I'm working (various places that a long is cast to an
int). For instance,

a = Numeric.array((3,))
a.resize((2**32,))

gives a.shape == (1,) instead of an error. Stuff like this happens in
the new array interface too :-)

I'd suggest, before releasing with a bumped version number to 24.0, we
release a beta version first. Shake out bugs in the array interface,
and potentially allow for some changes if necessary.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From mdehoon at ims.u-tokyo.ac.jp  Tue Apr  5 20:34:03 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Tue Apr  5 20:34:03 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42531880.3060600@ee.byu.edu>
References: <42531880.3060600@ee.byu.edu>
Message-ID: <4253597F.1090501@ims.u-tokyo.ac.jp>

Travis Oliphant wrote:
> I'd like to release a Numeric 24.0  to get the array interface out 
> there.   There are also some other bug fixes in Numeric 24.0

Thanks for the notification, Travis. I have commited patch #732520 (Eigenvalues 
on cygwin bug fix), which fixes bug #706716 (eigenvalues is broken). It's great 
to be a Numerical Python developer, I get to accept my own patches :-). The same 
patch was previously accepted by numarray.

About the array interface, my feeling is that while it may be helpful in the 
short run, it is likely to damage SciPy in the long run. The array interface 
allows different array implementations to move in different directions. These 
different implementations will be compatible with respect to the array 
interface, but incompatible otherwise (depending on the level of self-restraint 
of the developers of the different array implementations). So in the end, 
extension modules will be written for a specific array implementation anyway. At 
this point, Numerical Python is the most established and has most users. 
Numarray, as far as I can tell, keeps closer to the Numerical Python tradition, 
so maybe extension modules can work with either one without further modification 
(e.g., pygist seems to work with both Numerical Python and numarray). But SciPy 
has been moving away (e.g. by replacing functions by methods). As extension 
module writers are usually busy people, they may not be willing to modify their 
code so that it works with SciPy, and even less to maintain two versions of 
their code, one for Numerical Python/numarray and one for SciPy. Users who could 
previously choose to install SciPy as an addition to Numerical Python, now find 
that they have to choose between SciPy and Numerical Python. As Numerical Python 
has many more extension packages, I expect that SciPy will end up losing users.

Personally I use Numerical Python, and I plan to continue to use it for years to 
come, so it doesn't matter much to me. I'm just warning that the array interface 
may be a Trojan horse for the SciPy project.

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From oliphant at ee.byu.edu  Tue Apr  5 22:26:38 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Tue Apr  5 22:26:38 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <4253597F.1090501@ims.u-tokyo.ac.jp>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp>
Message-ID: <425372A4.7020900@ee.byu.edu>

Michiel Jan Laurens de Hoon wrote:

> Travis Oliphant wrote:
>
>> I'd like to release a Numeric 24.0  to get the array interface out 
>> there.   There are also some other bug fixes in Numeric 24.0
>
>
>
> About the array interface, my feeling is that while it may be helpful 
> in the short run, it is likely to damage SciPy in the long run. 


Well, I guess we'll just have to see.   Again, I see the array interface 
as important for talking to other modules that may not need or want the 
"full power" of a packed array module like scipy.base is. 

> The array interface allows different array implementations to move in 
> different directions. These different implementations will be 
> compatible with respect to the array interface, but incompatible 
> otherwise (depending on the level of self-restraint of the developers 
> of the different array implementations). So in the end, extension 
> modules will be written for a specific array implementation anyway. At 
> this point, Numerical Python is the most established and has most 
> users. Numarray, as far as I can tell, keeps closer to the Numerical 
> Python tradition, so maybe extension modules can work with either one 
> without further modification (e.g., pygist seems to work with both 
> Numerical Python and numarray). 

> But SciPy has been moving away (e.g. by replacing functions by methods). 


Michiel, you seem to want to create this impression that "SciPy" is 
"moving away."  I'm not sure of your motivations.   But, since this is a 
public forum, I have to restate emphatically, that "SciPy" is not 
"moving away from Numeric."  It is all about bringing together the 
communities.  For the 5 years that scipy has been in development, it has 
always been about establishing a library of common routines that we 
could all share.   It has built on Numeric from the beginning.  Now, 
there is another "library" of routines that is developing around 
numarray.  It is this very real break that I'm trying to help fix.   I 
have no other "desire" to "move away" or "create a break"  or any other 
such notions that you seem to want to spread.   

That is precisely why I have publically discussed practically every step 
of my work.    You seem to be the only vocal one who thinks that 
scipy.base is not just a replacement for Numeric, but something else 
entirely.    So, I repeat:  **scipy.base is just a new version of 
Numeric with a few minor compatibility issues and a lot of added 
functionality and features**

For example,  despite your claims,  I have  not "replaced" functions by 
methods.  The  functions are still all there just like before.   I've 
simply noticed that numarray has a lot of methods and so I've added 
similar methods to the Numeric object to help numarray users make the 
transition back.

Everything else that I've changed, I've done to bring Numeric up-to-date 
with modern Python versions, and to fix old warts that have sat around 
for years.   If there are problems with my changes, speak up.    Tell me 
what to do to make the new Numeric better.  

> As extension module writers are usually busy people, they may not be 
> willing to modify their code so that it works with SciPy, and even 
> less to maintain two versions of their code, one for Numerical 
> Python/numarray and one for SciPy. 


It's comments like this that make me wonder what you are thinking.  It 
seems to me that you are the only one I've talked to that wants to 
maintain the notion of a "split".  Everybody else, I'm in contact with 
is in full support of merging the two communities behind a single 
scientific array object. 

Every extension module that compiles for Numeric should compile for 
scipy.base.   Notice that full scipy already has a huge number of 
extension modules that needs to compile for scipy.base.   So, I have 
every motivation to make that a painless process.   

> Users who could previously choose to install SciPy as an addition to 
> Numerical Python, now find that they have to choose between SciPy and 
> Numerical Python. As Numerical Python has many more extension 
> packages, I expect that SciPy will end up losing users.


Again,  scipy.base should *replace* Numerical Python for all users 
(except the most adamant who don't seem to want to go with the rest of 
the community).  scipy.base is a new version of Numeric.   On the 
C-level I don't know of any incompatibilities,  on the Python level 
there are a very few (most of them rarely-used typecode character issues 
which a simple search and replace will fix).  


I should emphasize this next point, since I don't seem to be coming 
across very clearly to some people.   As head Numeric developer,  I'm 
stating that **Numeric 24 is the last release that will be called 
Numeric**.   New releases of Numeric will be called scipy.base.  


Of course, I realize that people can do whatever they want with the old 
Numeric code base, but then they will be the ones responsible for 
continuing a "split," because the Numerical Python project at 
sourceforge will point people to install scipy.base.


Help me make the transition as painless as possible, that's all I'm 
asking.   People transitioning from Numeric should have no trouble at 
all as I repeatedly point out.  People transitioning from numarray will 
have a *little* harder time which is why the array interface should help 
out during that process.  It is helping people transition back from 
numarray that is 90% of the reason I've made any changes to the 
internals of Numeric.


I've been a happy and quiet Numeric user and developer for years, but I 
respect the problems that Perry, Rick, Paul, and Todd have pointed out 
with their numarray implementation, and I saw a way to support their 
needs inside of Numeric.  That is the whole reason for my efforts.  I 
wish people would stop trying to make it seem to casual readers of this 
forum that I'm trying to create a "whole new" incompatible system.   
Help me fix the obviously unnecessary incompatibilites where they may 
exist, and help me make automatic transistion scripts to help people 
upgrade painlessly to the newer Numeric. 


I very much appreciate all who voice your concerns.   Michiel, you are 
particularly appreciated because you are voice from a solid Numeric 
user.   I just think that such concerns would be more productive in the 
context of accepting the fact that an upgrade from Numeric to scipy.base 
is going to happen, rather than trying to make it look like some new 
"split" is occurring.    I've received a lot of offline support for the 
Numeric/numarray unification effort that scipy.base is.   It would help 
if  more people could provide public support on this forum so that 
others can see that I'm not just some outsider pushing some random 
ideas, but I am simply someone who decided to sacrifice some time for 
what I think is a very important effort.    It would also help if other 
people who have concerns would voice them (I'm very grateful for those 
who have expressed their concerns) so that we can all address them and 
get on the same page for future development. 


Right now, the CVS version of Numeric3 works reasonably.  It compiles 
and uses the old ufunc objects (which have only been extended to support 
the new types).   I could use a lot of help in finding bugs.    You can 
also try out the new array scalars to see how they work (math works on 
them now) and also see what may still be missing in their implementation.

>
> Personally I use Numerical Python, and I plan to continue to use it 
> for years to come, so it doesn't matter much to me. I'm just warning 
> that the array interface may be a Trojan horse for the SciPy project.


As long as you realize that as far as I know the other developers of 
Numerical Python are going to be moving to scipy.base, and so you will 
be using obsolete technology, you are free to do as you wish.   But, I 
really hope we can persuade you to join us.  It is much better if we 
work together. 


-Travis


From Fernando.Perez at colorado.edu  Tue Apr  5 22:43:33 2005
From: Fernando.Perez at colorado.edu (Fernando Perez)
Date: Tue Apr  5 22:43:33 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <425372A4.7020900@ee.byu.edu>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu>
Message-ID: <42537690.5040400@colorado.edu>

Travis Oliphant wrote:
> Michiel Jan Laurens de Hoon wrote:

>>But SciPy has been moving away (e.g. by replacing functions by methods). 
> 
> 
> 
> Michiel, you seem to want to create this impression that "SciPy" is 
> "moving away."  I'm not sure of your motivations.   But, since this is a 
> public forum, I have to restate emphatically, that "SciPy" is not 
> "moving away from Numeric."  It is all about bringing together the 
> communities.  For the 5 years that scipy has been in development, it has 
> always been about establishing a library of common routines that we 
> could all share.   It has built on Numeric from the beginning.  Now, 
> there is another "library" of routines that is developing around 
> numarray.  It is this very real break that I'm trying to help fix.   I 
> have no other "desire" to "move away" or "create a break"  or any other 
> such notions that you seem to want to spread.   

FWIW, I think you (Travis) have been exceedingly clear in explaining this 
process, and in pointing out how this is:

a) NOT a further split, but rather the EXACT OPPOSITE (numarray users will 
have a transition path back into a project which will provide the best of the 
old Numeric, along with all the critical enhancements which Perry, Todd et al. 
added to numarray).

b) a way, via the array protocol, to provide third-party low-level libraries 
an easy way to, AT THE C LEVEL, interact easily and efficiently (without 
unnecessary copies) with numeri* arrays.

I fail to see where Michiel gets his split/Trojan horse arguments, or what 
line of reasoning can connect your detailed explanations with such a 
conclusion.  In particular, the comments on the whole 'trojan' issue seem to 
me absolutely unfounded.  Nobody in their sane mind will use this protocol to 
invent a scipy.base competitor, which most likely would end up (if done right) 
  being simply a copy.  What it provides is a minimal, compact, low-level API 
which will be a huge boon for interoperability with things like PIL, WX or 
other simliar libraries.  This protocol has been extensively debated, and 
Scott's extensive comments have made this discussion a very productive one 
(along with the help of others, of course).  I can only see this as a GREAT 
step forward for numerical python support and reliability 'in the wild'.

I hesitated to send this message, but since you (Travis) have sunk an enormous 
amount of your time into this effort, which I can only applaud and rejoice in, 
I figure the least I can do is contribute a little to dispel some unnecessary 
confusion.  Users with less knowledge of the details may become afraid of 
using Python for scientific computing by reading Michiel's comments, which I 
think would be a shame.

Michiel, please note that none of what I said is meant to be a personal 
attack.  I simply feel it is necessary to clarify, in no uncertain terms, how 
your recent comments of impending doom are unfounded.

Best to all, and again thanks to Travis for this much needed hard work,

f


From Chris.Barker at noaa.gov  Tue Apr  5 23:59:31 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Tue Apr  5 23:59:31 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <425372A4.7020900@ee.byu.edu>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu>
Message-ID: <42538880.7010301@noaa.gov>

Travis Oliphant wrote:
>  It would help 
> if  more people could provide public support on this forum

Easy enough. I. for one am very happy about what Travis is doing. It 
seems to be exactly what is needed to mend the Numeric-numarray split, 
which has been an annoyance for a couple years now. I'm also VERY happy 
about the proposed array protocol. While I suppose it could facilitate 
the creation of other array packages, that is only speculation, and 
unlikely, in my judgment. What is I'm quite sure is going to happen is 
that other packages that do not provide an array implementation will be 
able to efficiently take arrays as input without crating a dependence on 
any particular package. I intend to make sure wxPython can efficiently 
take Numeric24 arrays, for instance. (Now that I think about it, it 
would be great if we could get this into wxPython2.6, which will be out 
pretty darn soon. I'm very pressed for time right now..can anyone help?)

 > It would also help if other
> people who have concerns would voice them (I'm very grateful for those 
> who have expressed their concerns) so that we can all address them and 
> get on the same page for future development.

My only concern is versioning. Particularly when under rapid development 
(but really this applies anytime), I'd really love to be able to have 
more than one version of Numeric (or SciPy.base, or whatever) installed 
at once, and be able to select which one is used at runtime, in code 
(before importing the first time, of course). This would facilitate 
testing, but also allow me to have a working environment for older apps 
that will continue to work, without modification or re-compiling, after 
installing a newer version.

Something like wxPython's wxversion is what I have in mind.

http://wiki.wxpython.org/index.cgi/MultiVersionInstalls

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                      		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From magnus at hetland.org  Wed Apr  6 00:30:48 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Wed Apr  6 00:30:48 2005
Subject: [Numpy-discussion] Possible example application of the array interface
Message-ID: <20050406072854.GA12700@idi.ntnu.no>

I was just thinking about some experimental designs, and whether I
could, perhaps, do the statistics in Python. I remembered having used
RPy [1] briefly at some time (there may be other similar bindings out
there -- I don't remember) and started thinking about whether I could,
perhaps, combine it with numpy in some way. My first thought was to
reimplement the relevant statistical functions; then I thought about
how to convert data back and forth -- but then it occurred to me that
R also uses arrays extensively, and that it could, perhaps, be
possible to expose those (through something like RPy) through the
array interface/protocol!

This would be (IMO) a good example of the benefits of the array
protocol; it's not a matter of "getting yet another array module". RPy
is an external library/language with *lots* of features that might be
useful to numpy users, many of which aren't likely to be implemented
in Python for quite a while, I'd guess (unless, perhaps, someone
writes a translator from R, which I'm sure is doable).

I don't know enough (at least yet ;) about the implementation of RPy
and the R library to say for sure whether this would even be possible,
but it does seem like it could be really useful...

[1] rpy.sf.net

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From sdementen at hotmail.com  Wed Apr  6 00:36:39 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 00:36:39 2005
Subject: [Numpy-discussion] Numeric 24.0
Message-ID: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>

Hi Travis,

Could you look at bug
[ 635104 ] segfault unpickling Numeric 'O' array
[ 567796 ] unpickling of 'O' arrays causes segfault   (duplicate of previous 
one)

I proposed a (rather simple) solution that I put in the comment of bug [ 
635104 ]. But apparently, nobody is looking at those bugs...

>
>I'd like to release a Numeric 24.0  to get the array interface out there.   
>There are also some other bug fixes in Numeric 24.0
>
>Here is the list so far from Numeric 23.7
>
>[Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a is 2-d 
>of Int16

This is quite disturbing. In fact for all types that are not exactly 
equivalent to python type, indexing a multidimensional array (rank > 1) 
return arrays even if the final shape is ().
So
type(zeros((5,2,4), Int8 )[0,0,0])  => <type 'array'>
type(zeros((5,2,4), Int32 )[0,0,0])  => <type 'array'>
type(zeros((5,2), Float32 )[0,0])  => <type 'array'>
But
type(zeros((5,2,4), Int )[0,0,0])  => <type 'int'>
type(zeros((5,2,4), Float64)[0,0,0])  => <type 'float'>
type(zeros((5,2,4), Float)[0,0,0])  => <type 'float'>
type(zeros((5,2,4), PyObject)[0,0,0])  => <type 'int'>

Notice too the weird difference betweeb Int <> Int32 and Float == Float64.

However, when indexing a onedimensional array (rank == 1), then we get back 
scalar for indexing operations on all types.

So, when you say "return the same type", do you think scalar or array (it 
smells like a recent discussion on Numeric3 ...) ?

>[unreported]  Added array interface
>[unreported]  Allow Long Integers to be used in slices
>[1123145]     Handle mu==0.0 appropiately in ranlib/ignpoi.
>[unreported]  Return error info in ranlib instead of printing it to stderr
>[1151892]     dot() would quit python with zero-sized arrays when using
>               dotblas. The BLAS routines *gemv and *gemm need LDA >= 1.
>[unreported]  Fixed empty for Object arrays
>
>Version 23.8  March 2005
>[Cooke]       Fixed more 64-bit issues (patch 117603)
>[unreported]  Changed arrayfnsmodule back to PyArray_INT where the code
>               typecasts to (int *).  Changed CanCastSafely to check
>               if sizeof(long) == sizeof(int)
>
>
>I'll wait a little bit to allow last minute bug fixes to go in, but I'd 
>realy like to see this release get out there.  For users of Numeric  >23.7 
>try
>Numeric.empty((10,20),'O')  if you want to see an *interesting* bug that is 
>fixed in CVS.
>
>-Travis
>
>


From nwagner at mecha.uni-stuttgart.de  Wed Apr  6 01:01:42 2005
From: nwagner at mecha.uni-stuttgart.de (Nils Wagner)
Date: Wed Apr  6 01:01:42 2005
Subject: [Numpy-discussion] errors=31 in scipy.test() with latest cvs versions of scipy and Numerical
Message-ID: <42539706.3000503@mecha.uni-stuttgart.de>

Hi all,

Using Numeric 24.0
 >>> scipy.__version__
'0.3.3_303.4599'

scipy.test() results in

======================================================================
ERROR: check_simple_todense (scipy.io.mmio.test_mmio.test_mmio_coordinate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/scipy/io/tests/test_mmio.py", 
line 152, in check_simple_todense
    b = mmread(fn).todense()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
254, in todense
    csc = self.tocsc()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
1437, in tocsc
    return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1])
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_add (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_getelement (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matvec (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_setelement (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocsc (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocsr (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_todense (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor1 (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor2 (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor3 (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_add (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_getelement (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matvec (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_setelement (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocsc (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocsr (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_todense (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor1 (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor2 (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor3 (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_dok)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 60, in check_elmul
    c = a ** b
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
186, in __pow__
    return csc ** other
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
485, in __pow__
    return csc_matrix(c,(rowc,ptrc),M=M,N=N)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_dok)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 71, in check_matmat
    
assert_array_almost_equal((asp*bsp).todense(),dot(asp.todense(),bsp.todense()))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
1184, in __mul__
    return self.matmat(other)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
239, in matmat
    res = csc.matmat(other)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
568, in matmat
    return csc_matrix(c, (rowc, ptrc), M=M, N=N)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_dok)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 75, in check_tocoo
    assert_array_almost_equal(a.todense(),self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
254, in todense
    csc = self.tocsc()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
1437, in tocsc
    return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1])
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_mult (scipy.sparse.Sparse.test_Sparse.test_dok)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 155, in check_mult
    D = A*A.T
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
1184, in __mul__
    return self.matmat(other)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
239, in matmat
    res = csc.matmat(other)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
568, in matmat
    return csc_matrix(c, (rowc, ptrc), M=M, N=N)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

----------------------------------------------------------------------
Ran 1173 tests in 3.113s

FAILED (errors=31)
<unittest.TextTestRunner object at 0x40f6a1ac>
 >>>

 
From cookedm at physics.mcmaster.ca  Wed Apr  6 02:23:11 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 02:23:11 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
References: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
Message-ID: <20050406092143.GA31688@arbutus.physics.mcmaster.ca>

On Wed, Apr 06, 2005 at 07:33:56AM +0000, S?bastien de Menten wrote:
> 
> Hi Travis,
> 
> Could you look at bug
> [ 635104 ] segfault unpickling Numeric 'O' array
> [ 567796 ] unpickling of 'O' arrays causes segfault   (duplicate of 
> previous one)
> 
> I proposed a (rather simple) solution that I put in the comment of bug [ 
> 635104 ]. But apparently, nobody is looking at those bugs...

This is too true. Travis added myself and Michiel de Hoon recently to
the developers, so there's some new blood, and we've been banging on
things, though. I'll have a look at it if I've got time. I personally
really hate bugs that crash my interpreter :-)

> >I'd like to release a Numeric 24.0  to get the array interface out there.  
> >There are also some other bug fixes in Numeric 24.0
> >
> >Here is the list so far from Numeric 23.7
> >
> >[Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a is 
> >2-d of Int16
> 
> This is quite disturbing. In fact for all types that are not exactly 
> equivalent to python type, indexing a multidimensional array (rank > 1) 
> return arrays even if the final shape is ().
> So
> type(zeros((5,2,4), Int8 )[0,0,0])  => <type 'array'>
> type(zeros((5,2,4), Int32 )[0,0,0])  => <type 'array'>
> type(zeros((5,2), Float32 )[0,0])  => <type 'array'>
> But
> type(zeros((5,2,4), Int )[0,0,0])  => <type 'int'>
> type(zeros((5,2,4), Float64)[0,0,0])  => <type 'float'>
> type(zeros((5,2,4), Float)[0,0,0])  => <type 'float'>
> type(zeros((5,2,4), PyObject)[0,0,0])  => <type 'int'>

> Notice too the weird difference betweeb Int <> Int32 and Float == Float64.

That's because Int is *not* Int32. Int32 is the first typecode of '1sil'
that has 32 bits. For (all?) platforms I've seen, that'll be 'i'.

Int corresponds to a Python integer, and Float corresponds to a Python
float. Now, a Python integer is actually a C long, and a Python float
is actually a C double. I've made a table:

Numeric type    typecode    Python type     C type      Array type
Int             'l'         int             long        PyArray_LONG
Int32           'i' [1]     N/A             int         PyArray_INT
Float           'd'         float           double      PyArray_DOUBLE
Float32         'f'         N/A             float       PyArray_FLOAT
Float64         'd'         float           double      PyArray_DOUBLE

[1] assuming sizeof(int)==4, which is true on most platforms. There are
some 64-bit platforms where this won't be true, I think.

On (all? most?) 32-bit platforms, sizeof(int) == sizeof(long) == 4, so
both Int and Int32 be 32-bit quantities. Not so on some 64-bit platforms
(Linux on an Athlon 64, like the one I'm typing at now), where
sizeof(long) == 8.

I've been fixing oodles of assumptions in Numeric where ints and longs
have been used interchangeably, hence the extended discussion :-)

[I haven't addressed here why you get an array sometimes and a
Python type the others. This is the standard, old, behaviour -- it's
likely not going to change in Numeric. Whether it's a *good* thing is
another question. scipy.base and numarray do it differently.]

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From cookedm at physics.mcmaster.ca  Wed Apr  6 02:46:55 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 02:46:55 2005
Subject: [Numpy-discussion] errors=31 in scipy.test() with latest cvs versions of scipy and Numerical
In-Reply-To: <42539706.3000503@mecha.uni-stuttgart.de>
References: <42539706.3000503@mecha.uni-stuttgart.de>
Message-ID: <20050406094438.GA32297@arbutus.physics.mcmaster.ca>

On Wed, Apr 06, 2005 at 10:00:06AM +0200, Nils Wagner wrote:
> Hi all,
> 
> Using Numeric 24.0
> >>> scipy.__version__
> '0.3.3_303.4599'
> 
> scipy.test() results in
> 
> ======================================================================
> ERROR: check_simple_todense (scipy.io.mmio.test_mmio.test_mmio_coordinate)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "/usr/lib/python2.3/site-packages/scipy/io/tests/test_mmio.py", 
> line 152, in check_simple_todense
>    b = mmread(fn).todense()
>  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
> 254, in todense
>    csc = self.tocsc()
>  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
> 1437, in tocsc
>    return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1])
>  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
> 357, in __init__
>    self._check()
>  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
> 375, in _check
>    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
> IndexError: invalid slice

(etc. -- note to self: use scipy for regression testing :-)

nnz is coming from
nnz = self.indptr[-1]
where self.indptr is an array of Int32.

Hmm, this corresponds to the behaviour I just responded to Sebastien de
Menten about. The problem is that nnz is *not* an Python integer; it's
an array, so the slice fails.

I think I was wrong in that email about saying this was expected
behaviour :-)

This comes from the recent fix of a[0,0] and a[0][0] returning
the same type. Either change that back, or else we need to spruce up the
slicing logic to consider 0-dimensional integer arrays as scalars.

A minimal test case:

a = Numeric.array([5,6,7,8])
b = Numeric.array([0,1,2,3], 'i')
n = b[-1]
assert a[:n] == 8

(I'm not tackling this right now)

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From magnus at hetland.org  Wed Apr  6 02:59:18 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Wed Apr  6 02:59:18 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050405203434.38638.qmail@web50204.mail.yahoo.com>
References: <20050405203434.38638.qmail@web50204.mail.yahoo.com>
Message-ID: <20050406095639.GA16810@idi.ntnu.no>

Scott Gilbert <xscottg at yahoo.com>:
>
> 
> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
> > 
> > Do we really have to break backward compatibility in order to add more
> > dimensions to the array module?
> > 
> 
> You're right.  The Python array module could change in a backwards
> compatible way.  Possibly using keyword arguments to specify parameters
> that have never been there before.
> 
> We could probably make sense out of array.insert(), array.append(),
> array.extend(), array.pop(), and array.reverse() by giving those an "axis"
> keyword.  Even array.remove() could be made to work for more dimensions,
> but it probably wouldn't get used often.  Maybe some of these would just
> raise an exception for ndims > 1.

Sure. I guess basically the extend/pop/reverse/etc. methods and the
ndim-functionality would sort of be two quite different ways of using
arrays, so keeping them mutually exclusive doesn't seem like a problem
to me.

This might speak in favour of separating the functionality into two
different classes, but I think there's merit to keeping it gathered,
because this is partly for basic use(rs) who just want to get an array
and do things to it that make sense. Appending to a multidimensional
array (as long as we don't tempt them with an axis keyword) just
doesn't make sense -- so people (hopefully) won't do it.

> Then we'd have to add some additional typecodes for complex and a
> few others.

Yeah; the question is how compatible the typecode system is with the
new array protocol -- some overlap and some differences, I believe
(without checking right now)?

So -- this might look a bit like patchwork. But I think might get that
if we have two modules (or classes) too -- one, called array, with the
existing functionality, and one, called (e.g.) ndarray, with a similar
but incompatible interface... It *may* be better, but I'm not quite
sure I think so.

In my experience (which may be very biased and selective here ;) the
array module isn't exactly among the "hottest" features of Python or
the standard libs. In fact, it seems almost a bit pointless to me. It
claims to have "efficient arrays of numeric values" but is the
efficiency really that great, if you write your code in Python? (Using
lists and psyco would, quite possibly, be just as good, for example.)

So -- at *least* adding the array protocol to it would be doing it a
favour, i.e., making it a useful module, and sort of a prototypical
example of the protocol and such. Adding more dimensions might simply
make it more useful. (I've many times been asked by people how to
create e.g. two-dimensional arrays in Python. It would be nice if
there was actually some basic support for it.)

> Under the hood, it would basically be a complete reimplementation,

Sure; except for the (possibly minor?) work involved, I don't see that
this is a problem? (Well... The inherent instability of new code,
perhaps... But still.)

> but maybe that is the way to go...  It does keep the number of array
> modules down.

Yes.

> I wonder which way would meet less resistance in getting accepted in
> the core. I think creating a new ndarray object would be less risk
> of breaking existing applications.

I guess that's true.

> >
> > There may be some issues with, e.g., typecode, but still...
> >
> 
> The .typecode attribute could return the same values it always has.

Sure. But we might end up with, e.g., a constructor that looks almost
exactly like the numpy array() constructor -- but whose typecodes are
different... :/

> The .__array_typestr__ attribute would return the new style values.
> That's confusing, but probably unavoidable.

Yes, if we do use this approach.

If we only allow one-dimensional arrays here (i.e., only add the
protocol to the existing functionality) there might be less confusion?

Oh, I don't know. Having a separate module or class/type might be just
as good an idea. Perhaps I'm just being silly :->

> It would be nice if there was only one set of typecodes for all of
> Python,

Yeah -- or some similar system (using type objects).

> but I think we're stuck with many (array module typecores, struct
> module typecodes, array protocol typecodes). 

:(

Yes, lots of history here. Oh, well. Not the greatest of problems, I
guess.

But using different typecodes in the explicit user-part of the
ND-array interface in the stdlibs from those in scipy, for example,
seems like a decidedly Bad Idea(tm). So ... that might be a good
enough reason for using a separate ndarray entity, unless there can be
some upward compatibility somehow.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From sdementen at hotmail.com  Wed Apr  6 03:12:32 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 03:12:32 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
Message-ID: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>

Hi,

I follow with great interest the threads around Numeric3/scipy.base.
As Travis suggested (?It would also help if other people who have concerns 
would voice them (I'm very grateful for those who have expressed their 
concerns) so that we can all address them and get on the same page for 
future development.?), I voice my concert J

Sometimes it is quite useful to treat data at a higher level than just an 
?array of number of some types?. Adding metadata to array (I called them 
?augmented arrays?) is a simple way to add sense to an array. I see 
different user cases like:
1)	attaching a physical unit to array data (see for instance Unum 
http://home.tiscali.be/be052320/Unum.html )
	2) description of axis (see 
http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very useful 
to manipulate easily time series.
	3) masked arrays as in MA module of Numeric
	4) arrays for interval arithmetic where one keep another array with 
precision of data
	5) record arrays (currently being integrated in scipy.base as a base type)

The current solution for those situation is nicely summarized by quoting 
Konrad
?but rather a class written using arrays than a variety of the basic array 
type.
It?s actually pretty straightforward to implement, the most difficult choice 
being the form of the constructor that gives most flexibility in use.?

However, I disagree with the ?pretty straightforward to implement?. In fact, 
if one wants to inherit most of the functionalities of Numeric, it becomes 
quite cumbersome. Looking at MA module, I see that it needs to:
1)	redefine all methods (__add__, ?)
2)	redefine all ufuncs
3)	redefine all array functions (like reshape, sort, argmax, ?)
For other purposes, the same burden may apply.

A general solution to this problem is not straightforward and may be out of 
reach (computationally and/or conceptually).
However, a quite-general-enough elegant solution could solve most practical 
problems.

Looking at threads in this list, I think that there is enough brain power to 
get to something usable in the medium term.

An embryo of idea would be to add hooks in the machinery to allow an object 
to interact with an ufunc. Currently, this is done by calling __array__ to 
extract a ?naked array? (== Numeric.array vs ?augmented array?) but the 
result is then always a ?naked array?.
In pseudocode, this looks like:

  def ufunc( augmented_array ):
    if not isarray(augmented_array):
      augmented_array = augmented_array.__array__()
    return ufunc.apply(augmented_array)

where I would prefer something like

  def ufunc( augmented_array ):
    if not isarray(augmented_array):
      augmented_array, contructor = augmented_array.__array_constructor__()
    else:
      constructor = lambda x:x
    return constructor(ufunc.apply(augmented_array))

For array functions and methods, I have even less clues to a solution J. But 
calling hooks specified by some protocol would be a path:
a)	__array_constructor__
b)	__array_binary_op__ (would be called for __add__, __sub__, ?)
c)	__array_rbinary_op__ (would be called for __radd__, __rsub__, ?)

If I miss a point and there is an easy way to do this, I?ll be pleased to 
know it.
Otherwise, any feedback on this ability to easily increase array 
functionalities by appending metadata and related behavior.

Sebastien


From cjw at sympatico.ca  Wed Apr  6 03:15:13 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Wed Apr  6 03:15:13 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <424FE8E7.4040904@ee.byu.edu>
References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu>
Message-ID: <4253B691.5030902@sympatico.ca>

Travis Oliphant wrote:

> Colin J. Williams wrote:
>
>> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install
>> running install
>> running build
>> running config
>> error: The .NET Framework SDK needs to be installed before building 
>> extensions for Python.
>>
>> Is there any chance that a Windows binary could be made available for 
>> testing?
>
>
> Probably not in the near term (but you could ask Michiel).
>
> I'm assuming you have mingw32 installed which would allow you to build 
> it provided you have created an exports file for python2.4 (look on 
> the net for how to compile extensions with mingw32 using a MSVC 
> compiled python).
> You have to tell distutils what compiler to use:
>
> python setup.py config --compiler=mingw32
> python setup.py build --compiler=mingw32
> python setup.py install
>
> -Travis

Thanks to Michiel and Travis for their suggestions.  I am using Windows 
XP and get the following result:

    C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py
    config --compiler=minw32
    running config
    error: don't know how to compile C/C++ code on platform 'nt' with
    'minw32' compiler

    C:\Python24\Lib\site-packages\Numeric3\Download>

I would welcome any comments.

Colin W.


From cookedm at physics.mcmaster.ca  Wed Apr  6 03:31:40 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 03:31:40 2005
Subject: [Numpy-discussion] array interface nitpicks
Message-ID: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>

Just some small nitpicks in the array interface document
(http://numeric.scipy.org/array_interface.html):

As written:
"""
__array_shape__ (required)

Tuple showing size in each dimension. Each entry in the tuple must be
a Python (long) integer. Note that these integers could be larger
than the platform "int" or "long" could hold. Use Py_LONG_LONG if
accessing the entries of this tuple in C.
"""

Since this is supposed to be an interface, not an implementation
(duck-typing and all that), I think this is too strict:
__array_shape__ should just be a sequence of integers, not necessarily
a tuple. I'd suggest something like this:

'''
__array_shape__ (required)

Sequence whose elements are the size in each dimension. Each entry is
an integer (a Python int or long). Note that these integers could be
larger than the platform "int" or "long" could hold (a Python int is a
C long). It is up to the calling code to handle this appropiately;
either by raising an error when overflow is possible, or by using
Py_LONG_LONG as the C type for the shapes.
'''

This is clearer about the users responsibility -- note that Numeric
is taking the first approach (error), as the dimensions in
PyArrayObject are ints.

Similiar comments about __array_strides. I'd reword it along the lines
of

'''
__array_strides__ (optional)

Sequence of strides which provides the number of bytes needed to jump
to the next array element in the corresponding dimension. Each entry
must be integer (a Python int or long). As with __array_shape__, the
values may be larger than can be represented by a C "int" or "long";
the calling code should handle this appropiately, either by raising an
error, or by using Py_LONG_LONG in C.
Default is a strides tuple which implies a C-style contiguous memory
buffer. In this model, the last dimension of the array varies the
fastest. For example, the default __array_strides__ tuple for an
object whose array entries are 8 bytes long and whose __array_shape__
is (10,20,30) would be (4800, 240, 8)
Default: C-style contiguous
'''

I'm mostly worried about the use of Python longs; it shouldn't be
necessary in almost all cases, and adds extra complications (in normal
usage, you don't see Python longs all that much).

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From cjw at sympatico.ca  Wed Apr  6 03:33:05 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Wed Apr  6 03:33:05 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
Message-ID: <4253BAA1.7010403@sympatico.ca>

S?bastien de Menten wrote:

> Hi,
>
> I follow with great interest the threads around Numeric3/scipy.base.
> As Travis suggested (?It would also help if other people who have 
> concerns would voice them (I'm very grateful for those who have 
> expressed their concerns) so that we can all address them and get on 
> the same page for future development.?), I voice my concert J
>
> Sometimes it is quite useful to treat data at a higher level than just 
> an ?array of number of some types?. Adding metadata to array (I called 
> them ?augmented arrays?) is a simple way to add sense to an array. I 
> see different user cases like:
> 1) attaching a physical unit to array data (see for instance Unum 
> http://home.tiscali.be/be052320/Unum.html )
> 2) description of axis (see 
> http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very 
> useful to manipulate easily time series.

Does the record array provide a means of addressing this need?

> 3) masked arrays as in MA module of Numeric
> 4) arrays for interval arithmetic where one keep another array with 
> precision of data
> 5) record arrays (currently being integrated in scipy.base as a base 
> type)
>
Yes, and there is numarray's array of objects.

> The current solution for those situation is nicely summarized by 
> quoting Konrad
> ?but rather a class written using arrays than a variety of the basic 
> array type.
> It?s actually pretty straightforward to implement, the most difficult 
> choice being the form of the constructor that gives most flexibility 
> in use.?
>
[snip]

Colin W.


From rkern at ucsd.edu  Wed Apr  6 03:36:51 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Wed Apr  6 03:36:51 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050406095639.GA16810@idi.ntnu.no>
References: <20050405203434.38638.qmail@web50204.mail.yahoo.com> <20050406095639.GA16810@idi.ntnu.no>
Message-ID: <4253BB73.5000605@ucsd.edu>

Magnus Lie Hetland wrote:

> So -- at *least* adding the array protocol to it would be doing it a
> favour, i.e., making it a useful module, and sort of a prototypical
> example of the protocol and such. Adding more dimensions might simply
> make it more useful. (I've many times been asked by people how to
> create e.g. two-dimensional arrays in Python. It would be nice if
> there was actually some basic support for it.)

Re-implementing the stdlib-array module to support multiple dimensions 
is almost certainly a non-starter. You can't easily do it without 
breaking its pre-allocation strategy. It preallocates memory for 
elements using the same algorithm that lists do, so .append() has 
reasonable amortized time behaviour.

python-dev will not appreciate changing the algorithmic complexity of a 
long-existing component to accomodate a half-arsed implementation of N-D 
arrays.

OTOH, it is the one reason for stdlib-array's use in a Numeric world: 
sometimes, you just need to append values; you can't pre-allocate with 
Numeric.empty() and index in values. Using stdlib-array to collect the 
values, then using the buffer interface (soon-to-be __array__ interface) 
to convert to a Numeric array is faster than the alternatives.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From sdementen at hotmail.com  Wed Apr  6 03:59:35 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 03:59:35 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or
 Numeric3)
In-Reply-To: <4253BAA1.7010403@sympatico.ca>
Message-ID: <BAY103-F20FA88B55E08686CDEB8F6A43D0@phx.gbl>

>>1) attaching a physical unit to array data (see for instance Unum 
>>http://home.tiscali.be/be052320/Unum.html )
>>2) description of axis (see 
>>http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very 
>>useful to manipulate easily time series.
>
>Does the record array provide a means of addressing this need?
>

Not really, when I mean axis, I speak about indexing.
For an array (named a) with shape (10, 5, 33), I would like to attach 3 
arrays or list or tuple (named axis_information[0], axis_information[1] and 
axis_information[2])  of size (10,), (5,) and (33,)  which give sense to the 
first, second and third index.
For instance,
A[i,j,k] => means the element of A at (axis_information[0][i], 
axis_information[1][j], axis_information[2][k])
instead of
A[i,j,k] => means the element of A at index position [i,j,k] which makes 
less sense (you always need to track the meaning of i,j,k in parallel).

>>3) masked arrays as in MA module of Numeric

Maybe this one could be implemented using record array with a record like 
(data, mask).
However, it would be cumbersome to use.
E.g.  a.field("data")[:] = cos( a.field("data")[:] )
instead of
a[:] = cos(a[:])
with the current MA module

>>4) arrays for interval arithmetic where one keep another array with 
>>precision of data
>>5) record arrays (currently being integrated in scipy.base as a base type)
>>
>Yes, and there is numarray's array of objects.
>

This is overkilling as it eats way too much memory.
E.g. your data represents instantaneous speeds and so it tagged with a "m/s" 
information (a complex object) valid for the full array. Distributing this 
information to each component of an array via an array object is not 
practical.


From mdehoon at ims.u-tokyo.ac.jp  Wed Apr  6 04:22:52 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Wed Apr  6 04:22:52 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <4253B691.5030902@sympatico.ca>
References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca>
Message-ID: <4253C73E.4030703@ims.u-tokyo.ac.jp>

Colin J. Williams wrote:
> Thanks to Michiel and Travis for their suggestions.  I am using Windows 
> XP and get the following result:
> 
>    C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py
>    config --compiler=minw32
>    running config
>    error: don't know how to compile C/C++ code on platform 'nt' with
>    'minw32' compiler
> 
>    C:\Python24\Lib\site-packages\Numeric3\Download>
> 
> I would welcome any comments.

--mingw32 contains a 'g'.
Also, make sure you have Cygwin installed, with all the necessary packages.

--Michiel.


-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From steve at shrogers.com  Wed Apr  6 05:12:39 2005
From: steve at shrogers.com (Steven H. Rogers)
Date: Wed Apr  6 05:12:39 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <425372A4.7020900@ee.byu.edu>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu>
Message-ID: <4253D1B9.90709@shrogers.com>


Travis Oliphant wrote:
> 
> Again,  scipy.base should *replace* Numerical Python for all users 
> (except the most adamant who don't seem to want to go with the rest of 
> the community).  scipy.base is a new version of Numeric.   On the 
> C-level I don't know of any incompatibilities,  on the Python level 
> there are a very few (most of them rarely-used typecode character issues 
> which a simple search and replace will fix). 
> 
> I should emphasize this next point, since I don't seem to be coming 
> across very clearly to some people.   As head Numeric developer,  I'm 
> stating that **Numeric 24 is the last release that will be called 
> Numeric**.   New releases of Numeric will be called scipy.base. 
> 

I'm happy with the direction your taking to rejoin Numeric and Numarray. 
However, changing the name from Numeric to scipy.base may contribute to the 
confusion/concern.  Is it really necessary?

Steve
-- 
Steven H. Rogers, Ph.D., steve at shrogers.com
Weblog: http://shrogers.com/weblog
"Reach low orbit and you're half way to anywhere in the Solar System."
-- Robert A. Heinlein


From konrad.hinsen at laposte.net  Wed Apr  6 07:49:45 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Wed Apr  6 07:49:45 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
Message-ID: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net>

On Apr 6, 2005, at 12:10, S?bastien de Menten wrote:

> However, I disagree with the ?pretty straightforward to implement?. In 
> fact, if one wants to inherit most of the functionalities of Numeric, 
> it becomes quite cumbersome. Looking at MA module, I see that it needs 
> to:

It is straightforward AND cumbersome. Lots of work, but nothing 
difficult. I agree of course that it would be nice to improve the 
situation.

> An embryo of idea would be to add hooks in the machinery to allow an 
> object to interact with an ufunc. Currently, this is done by calling 
> __array__ to extract a ?naked array? (== Numeric.array vs ?augmented 
> array?) but the result is then always a ?naked array?.
> In pseudocode, this looks like:
>
>  def ufunc( augmented_array ):
>    if not isarray(augmented_array):
>      augmented_array = augmented_array.__array__()
>    return ufunc.apply(augmented_array)

The current behaviour of Numeric is more like

	def ufunc(object):
		if isarray(object):
			return array_ufunc(object)
		elif is_array_like(object):
			return array_func(array(object))
		else:
			return object.ufunc()

A more general version, which should cover your case as well, would be:

	def ufunc(object):
		if isarray(object):
			return array_ufunc(object)
		else:
			try:
				return object.applyUfunc(ufunc)
			except AttributeError:
				if is_array_like(object):
					return array_func(array(object))
				else:
					raise ValueError

There are two advantages:

1) Classes can handle ufuncs in any way they like, even if they 
implement
    array-like objects.
2) Classes must implement only one method, not one per ufunc.

Compared to the approach that you suggested:

> where I would prefer something like
>
>  def ufunc( augmented_array ):
>    if not isarray(augmented_array):
>      augmented_array, contructor = 
> augmented_array.__array_constructor__()
>    else:
>      constructor = lambda x:x
>    return constructor(ufunc.apply(augmented_array))

mine has the advantage of also covering classes that are not array-like 
at all.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Laboratoire L?on Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
---------------------------------------------------------------------


From cjw at sympatico.ca  Wed Apr  6 08:16:33 2005
From: cjw at sympatico.ca (cjw at sympatico.ca)
Date: Wed Apr  6 08:16:33 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <BAY103-F20FA88B55E08686CDEB8F6A43D0@phx.gbl>
References: <BAY103-F20FA88B55E08686CDEB8F6A43D0@phx.gbl>
Message-ID: <4253FCD1.2090808@sympatico.ca>

S?bastien de Menten wrote:

>>> 1) attaching a physical unit to array data (see for instance Unum 
>>> http://home.tiscali.be/be052320/Unum.html )
>>> 2) description of axis (see 
>>> http://sourceforge.net/mailarchive/message.php?msg_id=11051806). 
>>> Very useful to manipulate easily time series.
>>
>>
>> Does the record array provide a means of addressing this need?
>>
>
> Not really, when I mean axis, I speak about indexing.

Fair enough, I was thinking one dimensionally.

> For an array (named a) with shape (10, 5, 33), I would like to attach 
> 3 arrays or list or tuple (named axis_information[0], 
> axis_information[1] and axis_information[2])  of size (10,), (5,) and 
> (33,)  which give sense to the first, second and third index.
> For instance,
> A[i,j,k] => means the element of A at (axis_information[0][i], 
> axis_information[1][j], axis_information[2][k])
> instead of
> A[i,j,k] => means the element of A at index position [i,j,k] which 
> makes less sense (you always need to track the meaning of i,j,k in 
> parallel).
>
>>> 3) masked arrays as in MA module of Numeric
>>
>
> Maybe this one could be implemented using record array with a record 
> like (data, mask).
> However, it would be cumbersome to use.
> E.g.  a.field("data")[:] = cos( a.field("data")[:] )
> instead of
> a[:] = cos(a[:])
> with the current MA module

Assuming "data" is the name of a field in a record array "a", why not
have a.data to represent a view (or copy, depending on the convention 
adopted) of a column in a or
a.data.Cos to provide the cosines of the values in the data column?

"Cos" is used in place of "cos" to distinguish the method from the 
function.  The former requires no parentheses.

This assumes that the values in data are of the approriate numerictype ( 
with its appropriate typecode).

Colin W.

>
>
>>> 4) arrays for interval arithmetic where one keep another array with 
>>> precision of data
>>> 5) record arrays (currently being integrated in scipy.base as a base 
>>> type)
>>>
>> Yes, and there is numarray's array of objects.
>>
>
> This is overkilling as it eats way too much memory.
> E.g. your data represents instantaneous speeds and so it tagged with a 
> "m/s" information (a complex object) valid for the full array. 
> Distributing this information to each component of an array via an 
> array object is not practical.
>


From sdementen at hotmail.com  Wed Apr  6 08:52:05 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 08:52:05 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or
 Numeric3)
Message-ID: <BAY103-F98C5A1CCD74EAF8461141A43D0@phx.gbl>

>>
>>Maybe this one could be implemented using record array with a record like 
>>(data, mask). However, it would be cumbersome to use. E.g.  
>>a.field("data")[:] = cos( a.field("data")[:] ) instead of a[:] = cos(a[:]) 
>>with the current MA module
>
>Assuming "data" is the name of a field in a record array "a", why not have 
>a.data to represent a view (or copy, depending on the convention adopted) 
>of a column in a or a.data.Cos to provide the cosines of the values in the 
>data column?
>
>"Cos" is used in place of "cos" to distinguish the method from the 
>function.  The former requires no parentheses.
>

Well, I think the whole point is to be able to use "without changes" any 
library that manipulate arrays with "augmented arrays": same code for all 
arrays independently of them being "naked" or "augmented".

The "without changes" and "any library" should be taken with a pinch of salt 
as operation that are accepted for any array will not necessarily mean 
something for some "augmented arrays".

On a side note, I rather prefer to keep mathematical notation instead of OO 
notation ( cos as function vs method )


From sdementen at hotmail.com  Wed Apr  6 09:07:07 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 09:07:07 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or
 Numeric3)
In-Reply-To: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net>
Message-ID: <BAY103-F39227597DD89A7D5D14AFAA43D0@phx.gbl>

>
>>However, I disagree with the ?pretty straightforward to implement?. In 
>>fact, if one wants to inherit most of the functionalities of Numeric, it 
>>becomes quite cumbersome. Looking at MA module, I see that it needs to:
>
>It is straightforward AND cumbersome. Lots of work, but nothing difficult. 
>I agree of course that it would be nice to improve the situation.

My fault, I misunderstood your answer (... but it was a little bit 
misleading :-)


>The current behaviour of Numeric is more like
>
>	def ufunc(object):
>		if isarray(object):
>			return array_ufunc(object)
>		elif is_array_like(object):
>			return array_func(array(object))
>		else:
>			return object.ufunc()
>
>A more general version, which should cover your case as well, would be:
>
>	def ufunc(object):
>		if isarray(object):
>			return array_ufunc(object)
>		else:
>			try:
>				return object.applyUfunc(ufunc)
>			except AttributeError:
>				if is_array_like(object):
>					return array_func(array(object))
>				else:
>					raise ValueError
>
>There are two advantages:
>
>1) Classes can handle ufuncs in any way they like, even if they implement
>    array-like objects.
>2) Classes must implement only one method, not one per ufunc.
>
>Compared to the approach that you suggested:
>
>>where I would prefer something like
>>
>>  def ufunc( augmented_array ):
>>    if not isarray(augmented_array):
>>      augmented_array, contructor = 
>>augmented_array.__array_constructor__()
>>    else:
>>      constructor = lambda x:x
>>    return constructor(ufunc.apply(augmented_array))
>
>mine has the advantage of also covering classes that are not array-like at 
>all.
>

Yes !! That's a elegant solution for the ufunc part.

Do you think it is possible to integrate a similar mechanism in array 
functions (like searchsorted, argmax, ...).

If we can register functions taking one array as argument within scipy.base 
and let it dispatch those functions as ufunc, we could use a similar 
strategy.

For instance, let "sort" and "argmax" be registered as gfunc (general 
functions on an array <> ufunc), then any class that would like to overide 
any of them could do it too with the same trick Konrad exposed here above.

If another function uses those gfuncs and ufuncs, it inherits the genericity 
of the latter.

Konrad, do you think it is tricky to have a prototype of your suggestion 
(i.e. the modification does not need a full understanding of Numeric and you 
can locate it approximately in the source code) ?

Seb

>Konrad.
>--


From mike_lists at yahoo.com.au  Wed Apr  6 10:12:39 2005
From: mike_lists at yahoo.com.au (Michael Sorich)
Date: Wed Apr  6 10:12:39 2005
Subject: [Numpy-discussion] Possible example application of the array interface
In-Reply-To: 6667
Message-ID: <20050406171008.58480.qmail@web53602.mail.yahoo.com>

I think that this is a great idea! While I have a
strong preference for python, I generally use R for
statistical analyses due to the large number of mature
libraries available. There are also some aspects of
the R data types (eg data-frames and column/row names
for 2D arrays) that are really nice for spreadsheet
like data. I hope that scipy.base record arrays will
be as easily manipulated as data-frames are. 

While RPy works well for small simple problems, there
are data conversion limitations between R and Python.
If one could efficiently convert between the major R
data types and python scipy.base data types without
loss of data, it would become possible to do most of
the data manipulation in python and freely mix in R
functions when required. This may encourage the use of
python for the development of statistical routines. 

>From my meager understanding of RPy:

R vectors are converted to python lists. It may make
more sense to convert them to an array (either stdlib
or scipy.base version) - without copying data if
possible.

R arrays and matrices are converted to Numeric arrays.
Eg

In [8]: r.array([1,2,3,4,5,6],dim=[2,3])
Out[8]:
array([[1, 3, 5],
       [2, 4, 6]])

However, column and row names (or dimnames for arrays
with >2 dimensions) are lost in R->Py conversion. I do
not know whether these conversions require copying of
the data.

R data-frames are currently converted to python
dictionaries and I don?t think that there is any
simple way to convert a python object to an R data
frame. This is the biggest limitation of rpy in my
opinion. 

In [16]:
r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four'])
Out[16]: {'col2': ['one', 'two', 'three', 'four'],
'col1': [1, 2, 3, 4]}

If it were possible to convert between an R data-frame
and a scipy.base record array without copying or
losing data, RPy would become more useful.

I wish I understood C, scipy.base and R well enough to
give this a go. However, this is Way over my head! 

Mike 

--- Magnus Lie Hetland <magnus at hetland.org> wrote:
> I was just thinking about some experimental designs,
> and whether I
> could, perhaps, do the statistics in Python. I
> remembered having used
> RPy [1] briefly at some time (there may be other
> similar bindings out
> there -- I don't remember) and started thinking
> about whether I could,
> perhaps, combine it with numpy in some way. My first
> thought was to
> reimplement the relevant statistical functions; then
> I thought about
> how to convert data back and forth -- but then it
> occurred to me that
> R also uses arrays extensively, and that it could,
> perhaps, be
> possible to expose those (through something like
> RPy) through the
> array interface/protocol!
> 
> This would be (IMO) a good example of the benefits
> of the array
> protocol; it's not a matter of "getting yet another
> array module". RPy
> is an external library/language with *lots* of
> features that might be
> useful to numpy users, many of which aren't likely
> to be implemented
> in Python for quite a while, I'd guess (unless,
> perhaps, someone
> writes a translator from R, which I'm sure is
> doable).
> 
> I don't know enough (at least yet ;) about the
> implementation of RPy
> and the R library to say for sure whether this would
> even be possible,
> but it does seem like it could be really useful...
> 
> [1] rpy.sf.net
> 
> -- 
> Magnus Lie Hetland                    Fall seven
> times, stand up eight
> http://hetland.org                                 
> [Japanese proverb]
> 
> 
>
-------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT
> Products from real users.
> Discover which products truly live up to the hype.
> Start reading now.
>
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> 

Find local movie times and trailers on Yahoo! Movies.
http://au.movies.yahoo.com


From bsouthey at gmail.com  Wed Apr  6 11:38:37 2005
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed Apr  6 11:38:37 2005
Subject: [Numpy-discussion] Possible example application of the array interface
In-Reply-To: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
Message-ID: <bbcd77d00504061137318773ed@mail.gmail.com>

Hi,
I don't see that it is feasible to link R and numerical python in this
way. As you point out, R objects (R is an object orientated language)
uses a lot of meta-data.  Then there is the IEEE stuff (NaN etc) that
would also need to be handled in numerical python.

You probably could get RPy or RSPython to use numerical python rather
than just baisc Python.

What statistical functions would you want in numerical python? 

Regards
Bruce


On Apr 6, 2005 12:10 PM, Michael Sorich <mike_lists at yahoo.com.au> wrote:
> I think that this is a great idea! While I have a
> strong preference for python, I generally use R for
> statistical analyses due to the large number of mature
> libraries available. There are also some aspects of
> the R data types (eg data-frames and column/row names
> for 2D arrays) that are really nice for spreadsheet
> like data. I hope that scipy.base record arrays will
> be as easily manipulated as data-frames are.
> 
> While RPy works well for small simple problems, there
> are data conversion limitations between R and Python.
> If one could efficiently convert between the major R
> data types and python scipy.base data types without
> loss of data, it would become possible to do most of
> the data manipulation in python and freely mix in R
> functions when required. This may encourage the use of
> python for the development of statistical routines.
> 
> From my meager understanding of RPy:
> 
> R vectors are converted to python lists. It may make
> more sense to convert them to an array (either stdlib
> or scipy.base version) - without copying data if
> possible.
> 
> R arrays and matrices are converted to Numeric arrays.
> Eg
> 
> In [8]: r.array([1,2,3,4,5,6],dim=[2,3])
> Out[8]:
> array([[1, 3, 5],
>        [2, 4, 6]])
> 
> However, column and row names (or dimnames for arrays
> with >2 dimensions) are lost in R->Py conversion. I do
> not know whether these conversions require copying of
> the data.
> 
> R data-frames are currently converted to python
> dictionaries and I don't think that there is any
> simple way to convert a python object to an R data
> frame. This is the biggest limitation of rpy in my
> opinion.
> 
> In [16]:
> r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four'])
> Out[16]: {'col2': ['one', 'two', 'three', 'four'],
> 'col1': [1, 2, 3, 4]}
> 
> If it were possible to convert between an R data-frame
> and a scipy.base record array without copying or
> losing data, RPy would become more useful.
> 
> I wish I understood C, scipy.base and R well enough to
> give this a go. However, this is Way over my head!
> 
> Mike
> 
> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
> > I was just thinking about some experimental designs,
> > and whether I
> > could, perhaps, do the statistics in Python. I
> > remembered having used
> > RPy [1] briefly at some time (there may be other
> > similar bindings out
> > there -- I don't remember) and started thinking
> > about whether I could,
> > perhaps, combine it with numpy in some way. My first
> > thought was to
> > reimplement the relevant statistical functions; then
> > I thought about
> > how to convert data back and forth -- but then it
> > occurred to me that
> > R also uses arrays extensively, and that it could,
> > perhaps, be
> > possible to expose those (through something like
> > RPy) through the
> > array interface/protocol!
> >
> > This would be (IMO) a good example of the benefits
> > of the array
> > protocol; it's not a matter of "getting yet another
> > array module". RPy
> > is an external library/language with *lots* of
> > features that might be
> > useful to numpy users, many of which aren't likely
> > to be implemented
> > in Python for quite a while, I'd guess (unless,
> > perhaps, someone
> > writes a translator from R, which I'm sure is
> > doable).
> >
> > I don't know enough (at least yet ;) about the
> > implementation of RPy
> > and the R library to say for sure whether this would
> > even be possible,
> > but it does seem like it could be really useful...
> >
> > [1] rpy.sf.net
> >
> > --
> > Magnus Lie Hetland                    Fall seven
> > times, stand up eight
> > http://hetland.org
> > [Japanese proverb]
> >
> >
> >
> -------------------------------------------------------
> > SF email is sponsored by - The IT Product Guide
> > Read honest & candid reviews on hundreds of IT
> > Products from real users.
> > Discover which products truly live up to the hype.
> > Start reading now.
> >
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at lists.sourceforge.net
> >
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> >
> 
> Find local movie times and trailers on Yahoo! Movies.
> http://au.movies.yahoo.com
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>


From oliphant at ee.byu.edu  Wed Apr  6 12:28:50 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 12:28:50 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42537C6D.8040900@ims.u-tokyo.ac.jp>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> <42537C6D.8040900@ims.u-tokyo.ac.jp>
Message-ID: <425437E2.4090000@ee.byu.edu>

Michiel Jan Laurens de Hoon wrote:

> Travis Oliphant wrote:
>
>> Again,  scipy.base should *replace* Numerical Python for all users 
>
>
> Sorry, I give up. I have been very happy with Numerical Python so far 
> and the new Numerical Python just looks too much like SciPy to me. 
> It's even called scipy.base. In practical terms, what I've noticed is 
> that what used to work with Numerical Python no longer works with 
> Numeric3. For example:


It's apparent you have negative pre-conceptions about scipy (even though 
scipy has always just built on top of Numeric so I'm not sure what your 
difficulties have been).  This is unfortunate.  scipy.base is going to 
be a lot more like Numeric than scipy was.  So, I think you can relax.

>
> >>> from ndarray import *
> >>> argmax
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> NameError: name 'argmax' is not defined


This is only because the conversion hasn't completely taken place (I'm 
not importing the numeric.py module in __init__ yet because it hasn't 
been adjusted).     Remember ndarray is just a place-holder while 
development happens, so of course quite a few things aren't there yet.  
I've been swamped so far.    from ndarray import * won't even be the 
name to use.   The package won't be called ndarray.   This is all just 
for temporary development purposes.     All of what you belive should 
work will still continue to work.   So, relax.....


> >>>
>
> From what I understand from the discussion, "from Numeric import *" 
> will still work, but it will be deprecated, which means that I will 
> have to change my code at some point. Not to mention the other 
> packages (LinearAlgebra, RandomArray, etc.). It's just too much trouble.


Deprecated means new documentation won't teach that approach, that's 
pretty much it.  The approach will still be supported for quite a while 
so people can switch when and if they want.  I don't see "the trouble" 
at all.


> Anyway, I am about to change jobs (I will be moving to Columbia 
> University soon), so I have decided to take some time off the 
> Numerical Python project and see where we stand in a few months time. 
> Hopefully, the situation will have cleared up by then.


Sounds like an exciting move.   Perhaps I can meet you in person if I'm 
in New York or if you are every in Utah.    I sincerely hope you will 
find the new scipy.base to your liking.   I can promise you that your 
concerns are near the top of my list.    It's too bad you can't help us 
get there more quickly.  


-Travis


From oliphant at ee.byu.edu  Wed Apr  6 12:41:31 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 12:41:31 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
References: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
Message-ID: <42543B1B.3090209@ee.byu.edu>

S?bastien de Menten wrote:

>
> Hi Travis,
>
> Could you look at bug
> [ 635104 ] segfault unpickling Numeric 'O' array
> [ 567796 ] unpickling of 'O' arrays causes segfault   (duplicate of 
> previous one)
>
> I proposed a (rather simple) solution that I put in the comment of bug 
> [ 635104 ]. But apparently, nobody is looking at those bugs...


One thing I don't like about sourceforge bug tracker is that I don't get 
any email notification of bugs.  Is there an option for that?  I check 
my email, far more often than I check a website.  Sourceforge can be 
quite slow to manipulate around in.

Now, that you've mentioned it, I'll look into it.  I'm not sure that 
object arrays could every be pickled correctly.

-Travis

>
>>
>> I'd like to release a Numeric 24.0  to get the array interface out 
>> there.   There are also some other bug fixes in Numeric 24.0
>>
>> Here is the list so far from Numeric 23.7
>>
>> [Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a 
>> is 2-d of Int16
>
>
> This is quite disturbing. In fact for all types that are not exactly 
> equivalent to python type, indexing a multidimensional array (rank > 
> 1) return arrays even if the final shape is ().

So, what should it do?    This is the crux of a long-standing wart in 
Numerical Python that nobody has had a good solution to (I think the 
array scalars that have been introduced for scipy.base are the best 
solution yet). 

Right now, the point is that different things are done for different 
indexing strategies.  Is this a good thing?   Maybe it is.  We can 
certainly leave it the way it is now and back-out the change.

The current behavior is:

Subscripting always produces a rank-0 array if the type doesn't match a 
basic Python type.
Item getting always produces a basic Python type (even if there is no 
match). 

So a[0,0] and a[0][0]  will return different things if a is an array of 
short's for example.  This may be what we live with and just call it a 
"feature"


> So
> type(zeros((5,2,4), Int8 )[0,0,0])  => <type 'array'>
> type(zeros((5,2,4), Int32 )[0,0,0])  => <type 'array'>
> type(zeros((5,2), Float32 )[0,0])  => <type 'array'>
> But
> type(zeros((5,2,4), Int )[0,0,0])  => <type 'int'>
> type(zeros((5,2,4), Float64)[0,0,0])  => <type 'float'>
> type(zeros((5,2,4), Float)[0,0,0])  => <type 'float'>
> type(zeros((5,2,4), PyObject)[0,0,0])  => <type 'int'>
>
> Notice too the weird difference betweeb Int <> Int32 and Float == 
> Float64.


This has been in Numeric for a long time (the coercion problems was one 
of the big reasons for it).   If you return a Python integer when 
indexing an Int8 array then use that for multiplication you get 
undesired up-casting.  There is no scalar Int8 type to return (thus a 
0-dimensional array that can act like a scalar is returned).   In 
scipy.base there are now scalar-like objects for all of the supported 
array types which is one solution to this problem that was made possible 
by the ability to inherit in C that is now part of Python.

What platform are you on?  Notice that Int is interpreted as C-long 
(PyArray_LONG)  while Int32 is PyArray_INT.     This has been another 
wart in Numerical Python.

By the way, I've fixed PyArray_Return so that if 
sizeof(long)==sizeof(int) then PyArray_INT also returns a Python 
integer.   I think for places where sizeof(long)==sizeof(int) 
PyArray_LONG and PyArray_INT should be treated identically.

>
> However, when indexing a onedimensional array (rank == 1), then we get 
> back scalar for indexing operations on all types.
>
> So, when you say "return the same type", do you think scalar or array 
> (it smells like a recent discussion on Numeric3 ...) ?


I just think the behavior ought to be the same for a[0,0] or a[0][0]  
but maybe I'm wrong and we should keep the dichotomy to satisfy both 
groups of people.    Because of the problems I alluded to, sometimes a 
0-dimensional array should be returned.

-Travis


From tchur at optushome.com.au  Wed Apr  6 14:00:52 2005
From: tchur at optushome.com.au (Tim Churches)
Date: Wed Apr  6 14:00:52 2005
Subject: [Numpy-discussion] Possible example application of the array
 interface
In-Reply-To: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
Message-ID: <42544D54.7040507@optushome.com.au>

Michael Sorich wrote:
> While RPy works well for small simple problems, there
> are data conversion limitations between R and Python.
> If one could efficiently convert between the major R
> data types and python scipy.base data types without
> loss of data, it would become possible to do most of
> the data manipulation in python and freely mix in R
> functions when required. This may encourage the use of
> python for the development of statistical routines. 

That's exactly what we do in our project (http://www.netepi.org) which
uses NumPy, RPy and R. The Python<->R interface provided by RPy has a
few wrinkles but overall is remarkably seemless and remarkably robust.

>>From my meager understanding of RPy:
> 
> R vectors are converted to python lists. It may make
> more sense to convert them to an array (either stdlib
> or scipy.base version) - without copying data if
> possible.

RPy directly converts (by copying) NumPy arrays to R arrays and vice
versa. C code is used to do this and it is quite fast. No Python lists
are involved. You do need to have NumPy installed (oncluding its header
files) when you compile RPy for this to work - otherwise RPy *does*
convert R arrays to Python lists.

> R arrays and matrices are converted to Numeric arrays.
> Eg
> 
> In [8]: r.array([1,2,3,4,5,6],dim=[2,3])
> Out[8]:
> array([[1, 3, 5],
>        [2, 4, 6]])
> 
> However, column and row names (or dimnames for arrays
> with >2 dimensions) are lost in R->Py conversion. I do
> not know whether these conversions require copying of
> the data.
> 
> R data-frames are currently converted to python
> dictionaries and I don?t think that there is any
> simple way to convert a python object to an R data
> frame. This is the biggest limitation of rpy in my
> opinion. 
> 
> In [16]:
> r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four'])
> Out[16]: {'col2': ['one', 'two', 'three', 'four'],
> 'col1': [1, 2, 3, 4]}
> 
> If it were possible to convert between an R data-frame
> and a scipy.base record array without copying or
> losing data, RPy would become more useful.
> 
> I wish I understood C, scipy.base and R well enough to
> give this a go. However, this is Way over my head! 

You can extend the conversion routines of RPy (in either direction)
using a very simple interface, using just Python and R. No knowledge of
C is necessary. For example, if you want to convert an R data.frame into
a custom class which you have written in Python, it is quite easy to add
that to Rpy. There is an example for doing this with data.frames given
in the Rpy documentation.

(More comments below).

> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
> 
>>I was just thinking about some experimental designs,
>>and whether I
>>could, perhaps, do the statistics in Python. I
>>remembered having used
>>RPy [1] briefly at some time (there may be other
>>similar bindings out
>>there -- I don't remember)

There is also RSPython, which allows Python to be called from R as well
as R to be called from Python. However, it is far more experimental than
RPy, and much harder to build and rather less robust, but more ambitious
in its scope. RPy only allows calling of R functions (almost everything
is done via functions in R) from Python, although as noted above it has
good facilities for converting R objects back into Python objects, and
also allows R objects to be returned to Python as native, unconverted R
objects - so you can store native R objects in a Python list or
dictionary if you wish. You can't see inside those native R objects with
Python, but you can use them as arguments to R functions called via RPy.
However, the default action in RPy is to do its best to convert R
objects into Python data structures when R functions called via RPy
return. That conversion is easily customisable as noted above.

>> and started thinking
>>about whether I could,
>>perhaps, combine it with numpy in some way. My first
>>thought was to
>>reimplement the relevant statistical functions; then
>>I thought about
>>how to convert data back and forth -- but then it
>>occurred to me that
>>R also uses arrays extensively, and that it could,
>>perhaps, be
>>possible to expose those (through something like
>>RPy) through the
>>array interface/protocol!

It seems that the new NumPy array interface could indeed be used to
allow Python and R to share the same array data, rather than making
copies as happens at present (albeit very quickly).

>>This would be (IMO) a good example of the benefits
>>of the array
>>protocol; it's not a matter of "getting yet another
>>array module". RPy
>>is an external library/language with *lots* of
>>features that might be
>>useful to numpy users, many of which aren't likely
>>to be implemented
>>in Python for quite a while, I'd guess (unless,
>>perhaps, someone
>>writes a translator from R, which I'm sure is
>>doable).

R is a massive project with a huge library of statistical routines - it
is several times larger in its extent than Python (that's a weakness as
well as a strength, as R tends to be sprawling and rather intimidating
in its size). R also has a very large community of top computational
statisticians behind it. Better to work with R than to try to compete
with it. That said, there is no reason not to port R libraries or
specific R functions to NumPy where that provides performance gains, or
where the data are large and already handled in NumPy. Our approach in
NetEpi (http://www.netepi.org) is to do the data selection and reduction
(usually summarisation) in NumPy (where we store data on disc as
memory-mapped NumPy arrays) and then pass the much smaller summarised
results to R for plotting or fitting complex statistical models.
However, we do calculation of elementary statistics (means, quantiles
and other measures of location, variance etc) in NumPy wherever possible
to avoid copying large amounts of data to R via RPy.

>>I don't know enough (at least yet ;) about the
>>implementation of RPy
>>and the R library to say for sure whether this would
>>even be possible,
>>but it does seem like it could be really useful...
>>
>>[1] rpy.sf.net

I have copied this message to the RPy list - hopefully some fruitful
discussion can ensue.

Tim C


From gregory.r.warnes at pfizer.com  Wed Apr  6 14:02:05 2005
From: gregory.r.warnes at pfizer.com (Warnes, Gregory R)
Date: Wed Apr  6 14:02:05 2005
Subject: [Rpy] [Fwd: Re: [Numpy-discussion] Possible example applicati
	on of the array interface]
Message-ID: <915D2D65A9986440A277AC5C98AA466F978DC2@groamrexm02.amer.pfizer.com>

Hi All,

It is possible to establish conversion functions so that R dataframe, lists,
and vector objects are better translated into python equivalents.  I've made
several aborted stabs at this, but my time has been extremely limited.

The basic task is to create a functionally equivalent python class [The
tricky bit here is that R list and vector objects have both order and names.
It is possible to emulate this in python by creating a base object that
maintains a dictionary of names in along side the data vector/matrix data.]

See the example in the rpu documentation at
http://rpy.sourceforge.net/rpy/doc/manual_html/DataFrame-class.html#DataFram
e%20class.

This shouldn't be very hard if someone can dedicate a bit of time to it.

-Greg
(Current RPy maintainer)


> -----Original Message-----
> From: rpy-list-admin at lists.sourceforge.net
> [mailto:rpy-list-admin at lists.sourceforge.net]On Behalf Of Tim Churches
> Sent: Wednesday, April 06, 2005 4:22 PM
> To: rpy-list at lists.sourceforge.net
> Subject: [Rpy] [Fwd: Re: [Numpy-discussion] Possible example 
> application
> of the array interface]
> 
> 
> The following discussion occured on the Numeric Python mailing list.
> Others may wish to enjoin the conversation.
> 
> Tim C
> 
> -------- Original Message --------
> Subject: Re: [Numpy-discussion] Possible example application of the
> array interface
> Date: Thu, 7 Apr 2005 03:10:08 +1000 (EST)
> From: Michael Sorich <mike_lists at yahoo.com.au>
> To: numpy-discussion at lists.sourceforge.net
> 
> I think that this is a great idea! While I have a
> strong preference for python, I generally use R for
> statistical analyses due to the large number of mature
> libraries available. There are also some aspects of
> the R data types (eg data-frames and column/row names
> for 2D arrays) that are really nice for spreadsheet
> like data. I hope that scipy.base record arrays will
> be as easily manipulated as data-frames are.
> 
> While RPy works well for small simple problems, there
> are data conversion limitations between R and Python.
> If one could efficiently convert between the major R
> data types and python scipy.base data types without
> loss of data, it would become possible to do most of
> the data manipulation in python and freely mix in R
> functions when required. This may encourage the use of
> python for the development of statistical routines.
> 
> >From my meager understanding of RPy:
> 
> R vectors are converted to python lists. It may make
> more sense to convert them to an array (either stdlib
> or scipy.base version) - without copying data if
> possible.
> 
> R arrays and matrices are converted to Numeric arrays.
> Eg
> 
> In [8]: r.array([1,2,3,4,5,6],dim=[2,3])
> Out[8]:
> array([[1, 3, 5],
>        [2, 4, 6]])
> 
> However, column and row names (or dimnames for arrays
> with >2 dimensions) are lost in R->Py conversion. I do
> not know whether these conversions require copying of
> the data.
> 
> R data-frames are currently converted to python
> dictionaries and I don?t think that there is any
> simple way to convert a python object to an R data
> frame. This is the biggest limitation of rpy in my
> opinion.
> 
> In [16]:
> r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four'])
> Out[16]: {'col2': ['one', 'two', 'three', 'four'],
> 'col1': [1, 2, 3, 4]}
> 
> If it were possible to convert between an R data-frame
> and a scipy.base record array without copying or
> losing data, RPy would become more useful.
> 
> I wish I understood C, scipy.base and R well enough to
> give this a go. However, this is Way over my head!
> 
> Mike
> 
> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
> > I was just thinking about some experimental designs,
> > and whether I
> > could, perhaps, do the statistics in Python. I
> > remembered having used
> > RPy [1] briefly at some time (there may be other
> > similar bindings out
> > there -- I don't remember) and started thinking
> > about whether I could,
> > perhaps, combine it with numpy in some way. My first
> > thought was to
> > reimplement the relevant statistical functions; then
> > I thought about
> > how to convert data back and forth -- but then it
> > occurred to me that
> > R also uses arrays extensively, and that it could,
> > perhaps, be
> > possible to expose those (through something like
> > RPy) through the
> > array interface/protocol!
> > 
> > This would be (IMO) a good example of the benefits
> > of the array
> > protocol; it's not a matter of "getting yet another
> > array module". RPy
> > is an external library/language with *lots* of
> > features that might be
> > useful to numpy users, many of which aren't likely
> > to be implemented
> > in Python for quite a while, I'd guess (unless,
> > perhaps, someone
> > writes a translator from R, which I'm sure is
> > doable).
> > 
> > I don't know enough (at least yet ;) about the
> > implementation of RPy
> > and the R library to say for sure whether this would
> > even be possible,
> > but it does seem like it could be really useful...
> > 
> > [1] rpy.sf.net
> > 
> > -- 
> > Magnus Lie Hetland                    Fall seven
> > times, stand up eight
> > http://hetland.org                                 
> > [Japanese proverb]
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from 
> real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
> _______________________________________________
> rpy-list mailing list
> rpy-list at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rpy-list
> 
> 


LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.


From cookedm at physics.mcmaster.ca  Wed Apr  6 14:04:36 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 14:04:36 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42543B1B.3090209@ee.byu.edu> (Travis Oliphant's message of
 "Wed, 06 Apr 2005 13:40:11 -0600")
References: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
	<42543B1B.3090209@ee.byu.edu>
Message-ID: <qnkfyy3k48v.fsf@arbutus.physics.mcmaster.ca>

Travis Oliphant <oliphant at ee.byu.edu> writes:

> S?bastien de Menten wrote:
>
>>
>> Hi Travis,
>>
>> Could you look at bug
>> [ 635104 ] segfault unpickling Numeric 'O' array
>> [ 567796 ] unpickling of 'O' arrays causes segfault   (duplicate of
>> previous one)
>>
>> I proposed a (rather simple) solution that I put in the comment of
>> bug [ 635104 ]. But apparently, nobody is looking at those bugs...
>
>
> One thing I don't like about sourceforge bug tracker is that I don't
> get any email notification of bugs.  Is there an option for that?  I
> check my email, far more often than I check a website.  Sourceforge
> can be quite slow to manipulate around in.

I think if the bug is assigned to you, you get email.
>
>> So
>> type(zeros((5,2,4), Int8 )[0,0,0])  => <type 'array'>
>> type(zeros((5,2,4), Int32 )[0,0,0])  => <type 'array'>
>> type(zeros((5,2), Float32 )[0,0])  => <type 'array'>
>> But
>> type(zeros((5,2,4), Int )[0,0,0])  => <type 'int'>
>> type(zeros((5,2,4), Float64)[0,0,0])  => <type 'float'>
>> type(zeros((5,2,4), Float)[0,0,0])  => <type 'float'>
>> type(zeros((5,2,4), PyObject)[0,0,0])  => <type 'int'>
>>
>> Notice too the weird difference betweeb Int <> Int32 and Float ==
>> Float64.
>
> By the way, I've fixed PyArray_Return so that if
> sizeof(long)==sizeof(int) then PyArray_INT also returns a Python
> integer.   I think for places where sizeof(long)==sizeof(int)
> PyArray_LONG and PyArray_INT should be treated identically.

I don't think this is good -- it's just papering over the problem. It
leads to different behaviour on machines where sizeof(long) !=
sizeof(int) (specifically, the problem reported by Nils Wagner *won't*
be fixed by this on my machine). On some machines x[0] will give you a
int (where x is an array of Int32), on others an array: not fun.

I see you already beat me in changing PyArray_PyIntAsInt to support
rank-0 integer arrays. How about changing that to instead using
anything that int() can handle (using PyNumber_AsInt)? This would
include anything int-like (rank-0 integer arrays, scipy.base array
scalars, etc.).

The side-effect is that you can index using floats (since int() of a
float truncates it towards 0). If this is a big deal, I can
special-case floats to raise an error.

This would make (almost) all Numeric behaviour consistent with regards
to using Python ints, Python longs, and rank-0 integer arrays, and
other int-like objects.

>> However, when indexing a onedimensional array (rank == 1), then we
>> get back scalar for indexing operations on all types.
>>
>> So, when you say "return the same type", do you think scalar or
>> array (it smells like a recent discussion on Numeric3 ...) ?
>
> I just think the behavior ought to be the same for a[0,0] or a[0][0]
> but maybe I'm wrong and we should keep the dichotomy to satisfy both
> groups of people.    Because of the problems I alluded to, sometimes a
> 0-dimensional array should be returned.

I'd prefer having a[0,0] and a[0][0] return the same thing: it's not
the special case of how to do two indices: it's the special-casing of
rank-1 arrays as compared to rank-n arrays.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From cookedm at physics.mcmaster.ca  Wed Apr  6 14:42:38 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 14:42:38 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric
Message-ID: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>

I've always found the Numeric setup.py to be not very user-friendly.
So, I rewrote it. It's available as patch #1178095
http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369

Basically, all the editing you need to do is in customize.py, instead
of touching setup.py. No more commenting out files for lapack_lite
(just tell it to use the system LAPACK, and tell it where to find it).

Also, you could now use GSL's cblas interface for dotblas. Useful if
you've already taken the trouble to link that with an optimized
Fortran BLAS.

I didn't want to just through this into CVS without feedback first :-)
If it looks good, this can go in Numeric 24.0.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From perry at stsci.edu  Wed Apr  6 15:05:47 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Wed Apr  6 15:05:47 2005
Subject: [Numpy-discussion] Re: Array Metadata
In-Reply-To: <200504011146.44549.faltet@carabos.com>
References: <20050401041204.18335.qmail@web50208.mail.yahoo.com> <200504011146.44549.faltet@carabos.com>
Message-ID: <00c3ccc871b2107c78efa7cb3758fe8c@stsci.edu>

Coming in very late...

On Apr 1, 2005, at 4:46 AM, Francesc Altet wrote:

> I'm very much with the opinions of Scott. Just some remarks.
>
> A Divendres 01 Abril 2005 06:12, Scott Gilbert va escriure:

>>> I also think that rather than attach < or > to the start of the
>>> string it would be easier to have another protocol for endianness.
>>> Perhaps something like:
>>>
>>> __array_endian__  (optional Python integer with the value 1 in it).
>>> If it is not 1, then a byteswap must be necessary.
>>
>> A limitation of this approach is that it can't adequately represent
>> struct/record arrays where some fields are big endian and others are 
>> little
>> endian.
>
> Having a mix of different endianess data values in the same data
> record would be a bit ill-minded. In fact, numarray does not support
> this: a recarray should be all little or big endian. I think that '<'
> and '>' would be more than enough to represent this.
>
Nothing intrinsically prevents numarray from allowing this for records, 
but I'd agree that I have a hard time understanding when a given record 
array would have mixed endianess.

>>> So, what if we proposed for the Python core not something like
>>> Numeric3 (which would still exist in scipy.base and be everybody's
>>> favorite array :-) ), but a very minimal array object (scaled back
>>> even from Numeric) that followed the array protocol and had some
>>> C-API associated with it.
>>>
>>> This minimal array object would support 5 basic types ('bool',
>>> 'integer', 'float', 'complex', 'Object').   (Maybe a void type
>>> could be defined and a void "scalar" introduced (which would be
>>> the bytes object)).  These types correspond to scalars already
>>> available in Python and so the whole 0-dim array Python scalar
>>> arguments could be ignored.
>>
>> I really like this idea.  It could easily be implemented in C or 
>> Python
>> script.  Since half it's purpose is for documentation, the Python 
>> script
>> implementation might make more sense.
>
> Yeah, I fully agree with this also.
>
>
I'm not against it, but I wonder if it is the most important thing to 
do next. I can imagine that there are many other issues that deserve 
more attention than this. But I won't tell Travis what to do, 
obviously. Likewise about working on the current Python array module.

Perry

Perry


From perry at stsci.edu  Wed Apr  6 15:09:11 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Wed Apr  6 15:09:11 2005
Subject: [Numpy-discussion] Questions about ufuncs now.
In-Reply-To: <4253028D.4090407@ee.byu.edu>
References: <4253028D.4090407@ee.byu.edu>
Message-ID: <0d2b3dd0b5f97750022b47de6f1fad33@stsci.edu>

On Apr 5, 2005, at 5:26 PM, Travis Oliphant wrote:

>
> The arrayobject for scipy.base seems to be working.  Currently the 
> Numeric3 CVS tree is using the "old-style" ufuncs modified with new 
> code for the newly added types.     It should be quite functionable 
> now for the brave at heart.
>
> I'm now working on modifying the ufunc object for scipy.base.
>
> These are the changes I'm working on:
>
>   1) a thread-specific? context that allows "buffer-size" level 
> trapping
>   of errors and retrieving of flags set.  Similar to the
>   decimal.context specification, but it uses the floating point
>   sticky bits to implement.
>
>   2) implementation of buffers so that type-conversions (and
>   byteswapping and alignment if necessary) never creates temporaries
>   larger than the buffer-size (the buffer-size is user settable).
>
>   3) a reworking of the general N-dimensional loop to use array 
> iterators with optimizations
>   applied for contiguous arrays.
>
>   4) Alteration of coercion rules so that scalars (i.e. rank-0 arrays) 
> do not dictate coercion rules
>   Also, change so that certain mixed-type operations are computed in 
> larger type for both.
>
> Most of this is pretty straightforward.  But, I do have one addiitonal 
> question.  Do the new array scalars count as "non-coercing" scalars 
> (i.e. like the Python scalars), or do they cause coercion?
>
> My preference is that  ALL scalars (anything that becomes 
> 0-dimensional arrays internally) cause only "kind-casting" (i.e. int 
> to float, float to complex, etc.) but not "type-casting"
>
Seems reasonable. One could argue that since they have their own 
precision that normal coercion rules should apply, but so long as 
Python scalar literals don't, having different coercion rules for  what 
look like scalars taken from arrays than for python scalars is bound to 
lead to great confusion. So I agree.

Perry


From perry at stsci.edu  Wed Apr  6 15:09:51 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Wed Apr  6 15:09:51 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42537690.5040400@colorado.edu>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> <42537690.5040400@colorado.edu>
Message-ID: <7779a4425dd6f32659e9c5f15b48e180@stsci.edu>

I'll echo  Fernando's comments.
On Apr 6, 2005, at 1:41 AM, Fernando Perez wrote:

> Travis Oliphant wrote:
>> Michiel Jan Laurens de Hoon wrote:
>
>>> But SciPy has been moving away (e.g. by replacing functions by 
>>> methods).
>> Michiel, you seem to want to create this impression that "SciPy" is 
>> "moving away."  I'm not sure of your motivations.   But, since this 
>> is a public forum, I have to restate emphatically, that "SciPy" is 
>> not "moving away from Numeric."  It is all about bringing together 
>> the communities.  For the 5 years that scipy has been in development, 
>> it has always been about establishing a library of common routines 
>> that we could all share.   It has built on Numeric from the 
>> beginning.  Now, there is another "library" of routines that is 
>> developing around numarray.  It is this very real break that I'm 
>> trying to help fix.   I have no other "desire" to "move away" or 
>> "create a break"  or any other such notions that you seem to want to 
>> spread.
>
> FWIW, I think you (Travis) have been exceedingly clear in explaining 
> this process, and in pointing out how this is:
>
> a) NOT a further split, but rather the EXACT OPPOSITE (numarray users 
> will have a transition path back into a project which will provide the 
> best of the old Numeric, along with all the critical enhancements 
> which Perry, Todd et al. added to numarray).
>
> b) a way, via the array protocol, to provide third-party low-level 
> libraries an easy way to, AT THE C LEVEL, interact easily and 
> efficiently (without unnecessary copies) with numeri* arrays.
>
>
[...]


From Chris.Barker at noaa.gov  Wed Apr  6 15:37:05 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 15:37:05 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <4253C73E.4030703@ims.u-tokyo.ac.jp>
References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> <4253C73E.4030703@ims.u-tokyo.ac.jp>
Message-ID: <42546439.5060301@noaa.gov>


Michiel Jan Laurens de Hoon wrote:

> Also, make sure you have Cygwin installed, with all the necessary packages.

MinGw is NOT Cygwin. You need to have MinGw installed, with all the 
necessary packages. I don't remember which ones, but I think there is 
not a single large package that gives you the whole pile. I do remember 
it being pretty easy for me last time I did it.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From cookedm at physics.mcmaster.ca  Wed Apr  6 15:44:36 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 15:44:36 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> (konrad hinsen's
 message of "Wed, 6 Apr 2005 16:48:30 +0200")
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
	<8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net>
Message-ID: <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca>

konrad.hinsen at laposte.net writes:

> On Apr 6, 2005, at 12:10, S?bastien de Menten wrote:
>
>> However, I disagree with the "pretty straightforward to
>> implement". In fact, if one wants to inherit most of the
>> functionalities of Numeric, it becomes quite cumbersome. Looking at
>> MA module, I see that it needs to:
>
> It is straightforward AND cumbersome. Lots of work, but nothing
> difficult. I agree of course that it would be nice to improve the
> situation.
>
>> An embryo of idea would be to add hooks in the machinery to allow an
>> object to interact with an ufunc. Currently, this is done by calling
>> __array__ to extract a "naked array" (== Numeric.array vs
>> "augmented array") but the result is then always a "naked
>> array".
>> In pseudocode, this looks like:
>>
>>  def ufunc( augmented_array ):
>>    if not isarray(augmented_array):
>>      augmented_array = augmented_array.__array__()
>>    return ufunc.apply(augmented_array)
>
> The current behaviour of Numeric is more like
>
> 	def ufunc(object):
> 		if isarray(object):
> 			return array_ufunc(object)
> 		elif is_array_like(object):
> 			return array_func(array(object))
> 		else:
> 			return object.ufunc()
>
> A more general version, which should cover your case as well, would be:
>
> 	def ufunc(object):
> 		if isarray(object):
> 			return array_ufunc(object)
> 		else:
> 			try:
> 				return object.applyUfunc(ufunc)
> 			except AttributeError:
> 				if is_array_like(object):
> 					return array_func(array(object))
> 				else:
> 					raise ValueError
>
> There are two advantages:
>
> 1) Classes can handle ufuncs in any way they like, even if they
> implement
>     array-like objects.
> 2) Classes must implement only one method, not one per ufunc.

I like this! It's got namespace goodness all over it (last Python zen
line in 'import this': Namespaces are one honking great idea -- let's
do more of those!)

I'd propose making the special method __ufunc__.

> Compared to the approach that you suggested:
>
>> where I would prefer something like
>>
>>  def ufunc( augmented_array ):
>>    if not isarray(augmented_array):
>>      augmented_array, contructor =
>> augmented_array.__array_constructor__()
>>    else:
>>      constructor = lambda x:x
>>    return constructor(ufunc.apply(augmented_array))
>
> mine has the advantage of also covering classes that are not
> array-like at all.

... like your derivative classes, which are very useful. There are two
different uses that ufuncs apply to, however.

1) arrays. Here, we want efficient computation of functions applied to
   lots of elements. That's where the output arguments and special
   methods (.reduce, .accumulate, and .outer) are useful
2) polymorphic functions. Output arguments aren't useful here. The
   special methods are useful for binary ufuncs only.

For #2, just returning a callable from __ufunc__ would be fine. I'd
suggest two levels of an informal ufunc interface corresponding to
these two uses.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From Chris.Barker at noaa.gov  Wed Apr  6 15:49:44 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 15:49:44 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <42546709.1050600@noaa.gov>


David M. Cooke wrote:
> I've always found the Numeric setup.py to be not very user-friendly.
> So, I rewrote it. It's available as patch #1178095
> http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369

 From that file:

# If use_system_lapack is false, f2c'd versions of the required routines
# will be used, except on Mac OS X, where the vecLib framework will be used
# if found.

Just to be clear, this does mean that vecLib will be used by default on 
OS-X?

Very nice, setup.py has annoyed me too.

-Chris
-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From Chris.Barker at noaa.gov  Wed Apr  6 15:51:17 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 15:51:17 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <42546766.5060802@noaa.gov>

Hi all, (but mostly Travis),

I've taken a look at:

http://numeric.scipy.org/array_interface.html)

to try and see how I would use this with wxPython. I have a few 
questions, and a little code I'd like you to look at to see if I 
understand how this works.

Here's a first stab on how I might use this for the wxPython 
DrawPointsList method. The method takes a sequence of length-2 sequences 
of numbers, and draws a point at each point described by coordinates in 
the data:

[(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints)

Here's what I have:

     def DrawPointList(self, points, pens=None):
	...
	# some checking code on the pens)
         ...
         if (hasattr(points,'__array_shape__') and
                 hasattr(points,'__array_typestr__') and
                 len(points.__array_shape__) == 2 and
                 points.__array_shape__[1] == 2 and
                 points.__array_typestr__ == 'i4' and
                 ): # this means we have a compliant array
            # return the array protocol version
            return self._DrawPointArray(points.__array_data__, pens,[])
                    #This needs to be written now!
         else:
             #return the generic python sequence version
             return self._DrawPointList(points, pens, [])

Then we'll need a function (in C++):
  _DrawPointArray(points.__array_data__, pens,[])
That takes a buffer object, and does the drawing.

My questions:

1) Is this what you had in mind for how to use this?

2) As __array_strides__ is optional, I'd kind of like to have a 
__contiguous__ flag that I could just check, rather than checking for 
the existence of strides, then calculating what the strides should be, 
then checking them.

3) A number of the attributes are optional, but will always be there 
with SciPy arrays..(I assume) have you documented them anywhere?

4) a wxWidgets wxPoint is defined as such:

class WXDLLEXPORT wxPoint
{
public:
     int x, y;

etc.

As wxWidgets is using "int", I"d like to be able to use "int". If I 
define it as a 4 byte integer, I'm losing platform independence, aren't 
I? Or can I use something like sizeof(int) ?

5) Why is: __array_data__ optional? Isn't that the whole point of this?

6) Should __array_offset__ be optional? I'd rather it were required, but 
  default to zero. This way I have to check for it, then use it. Also, I 
assume it is an integer number of bytes, is that right?

7) An alternative to the above: A __simple_ flag, that means the data is 
a simple, C array of contiguous data of a single type. The most common 
use, and it would be nice to just check that flag and not have to take 
all other options into account.

Thanks,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From efiring at hawaii.edu  Wed Apr  6 15:53:05 2005
From: efiring at hawaii.edu (Eric Firing)
Date: Wed Apr  6 15:53:05 2005
Subject: [Numpy-discussion] masked arrays and NaNs
Message-ID: <425467BB.305@hawaii.edu>

Travis,

I am whole-heartedly in favor of your efforts to end the 
Numeric/numarray split by combining the best of both. I am encouraged by 
the progress you have made, and by the depth and clarity of the 
accompanying technical discussions.  Thank you!

I am a long-time Matlab user in Physical Oceanography, and I have been 
trying to find a practical way to phase out Matlab.  One key is 
matplotlib, which is coming along wonderfully.  A second is the 
availability of a Num* (or scipy.base) module that provides the 
functionality and ease-of-use I presently get from Matlab.  This leads 
to a request which I suspect and hope is consistent with your present 
plans: efficient handling of NaNs and/or masked arrays.

In Physical Oceanography, and I suspect in many other fields, data sets 
are almost always full of holes.  Matlab's ability to use NaN as a bad 
value flag provides a wonderfully simple and efficient way of dealing 
with missing or bad data values.  A similar ease and transparency would 
be good in scipy.base.  In addition, or as a way of implementing 
NaN-handling internally, it might be best to have masked arrays 
incorporated at the C level--with the functionality available by 
default--rather than bolted on as a pure-python package.  I hope that 
inclusion of __array_mask__ in the protocol means that this is part of 
the plan.

Eric


From Chris.Barker at noaa.gov  Wed Apr  6 16:00:09 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 16:00:09 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <42546439.5060301@noaa.gov>
References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> <4253C73E.4030703@ims.u-tokyo.ac.jp> <42546439.5060301@noaa.gov>
Message-ID: <425469AA.2030703@noaa.gov>


Chris Barker wrote:
> there is not a single large package 

OOPS. There IS a single large package.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From oliphant at ee.byu.edu  Wed Apr  6 16:13:08 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 16:13:08 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <qnkvf6zilv0.fsf@arbutus.physics.mcmaster.ca>
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>	<425458F7.9020307@ee.byu.edu> <qnkvf6zilv0.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <42546CC7.40408@ee.byu.edu>

David M. Cooke wrote:

>Travis Oliphant <oliphant at ee.byu.edu> writes:
>
>  
>
>>David M. Cooke wrote:
>>
>>    
>>
>>>I've always found the Numeric setup.py to be not very user-friendly.
>>>So, I rewrote it. It's available as patch #1178095
>>>http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369
>>>
>>>Basically, all the editing you need to do is in customize.py, instead
>>>of touching setup.py. No more commenting out files for lapack_lite
>>>(just tell it to use the system LAPACK, and tell it where to find it).
>>>
>>>Also, you could now use GSL's cblas interface for dotblas. Useful if
>>>you've already taken the trouble to link that with an optimized
>>>Fortran BLAS.
>>>
>>>I didn't want to just through this into CVS without feedback first :-)
>>>If it looks good, this can go in Numeric 24.0.
>>>
>>>      
>>>
>>I like the new changes.  I also think the setup.py file is unfriendly.
>>Put them in...
>>    
>>
>
>While I'm at it, I'm also thinking of writing a 'cblas_lite' for
>dotblas. This would mean that dotblas would be enabled all the time.
>You could use a C BLAS if you've got one (from ATLAS, say), or a
>Fortran BLAS (like the cxml library on an Alpha running Tru64), or it
>would use the existing blas_lite.c if you don't.
>
>  
>
This is a good idea, but for more than just dotblas. 

It is the essential problem that must be solved to make scipy.base 
installable everywhere yet use fast libraries for users who have them 
without much fuss.

-Travis


From rkern at ucsd.edu  Wed Apr  6 16:28:40 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Wed Apr  6 16:28:40 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <42546709.1050600@noaa.gov>
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca> <42546709.1050600@noaa.gov>
Message-ID: <42547060.30204@ucsd.edu>

Chris Barker wrote:
> 
> 
> David M. Cooke wrote:
> 
>> I've always found the Numeric setup.py to be not very user-friendly.
>> So, I rewrote it. It's available as patch #1178095
>> http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 
>>
> 
> 
>  From that file:
> 
> # If use_system_lapack is false, f2c'd versions of the required routines
> # will be used, except on Mac OS X, where the vecLib framework will be used
> # if found.
> 
> Just to be clear, this does mean that vecLib will be used by default on 
> OS-X?

I haven't tried it, yet, but my examination of it suggests that this is so.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From oliphant at ee.byu.edu  Wed Apr  6 16:59:05 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 16:59:05 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <42546766.5060802@noaa.gov>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca> <42546766.5060802@noaa.gov>
Message-ID: <4254778A.1070100@ee.byu.edu>

Chris Barker wrote:

> Hi all, (but mostly Travis),
>
> I've taken a look at:
>
> http://numeric.scipy.org/array_interface.html)
>
> to try and see how I would use this with wxPython. I have a few 
> questions, and a little code I'd like you to look at to see if I 
> understand how this works.


Great, fantastic!!!

>
> Here's a first stab on how I might use this for the wxPython 
> DrawPointsList method. The method takes a sequence of length-2 
> sequences of numbers, and draws a point at each point described by 
> coordinates in the data:
>
> [(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints)
>
> Here's what I have:
>
>     def DrawPointList(self, points, pens=None):
>     ...
>     # some checking code on the pens)
>         ...
>         if (hasattr(points,'__array_shape__') and
>                 hasattr(points,'__array_typestr__') and
>                 len(points.__array_shape__) == 2 and
>                 points.__array_shape__[1] == 2 and
>                 points.__array_typestr__ == 'i4' and
>                 ): # this means we have a compliant array
>            # return the array protocol version


You should account for the '<' or '>' that might be present in 
__array_typestr__   (Numeric won't put it there, but scipy.base and 
numarray will---since they can have byteswapped arrays internally).  

A more generic interface would handle multiple integer types if possible 
(but this is a good start...)


>            return self._DrawPointArray(points.__array_data__, pens,[])
>                    #This needs to be written now!
>         else:
>             #return the generic python sequence version
>             return self._DrawPointList(points, pens, [])
>
> Then we'll need a function (in C++):
>  _DrawPointArray(points.__array_data__, pens,[])
> That takes a buffer object, and does the drawing.
>
> My questions:
>
> 1) Is this what you had in mind for how to use this?


Yes, pretty much.

>
> 2) As __array_strides__ is optional, I'd kind of like to have a 
> __contiguous__ flag that I could just check, rather than checking for 
> the existence of strides, then calculating what the strides should be, 
> then checking them.


I don't want to add too much.  The other approach is to establish a set 
of helper functions in Python to check this sort of thing:   Thus, if 
you can't handle a general array you check:

ndarray.iscontiguous(obj) 

where obj exports the array interface.

But, it could really go either way.   What do others think?

I think one idea here is that if __array_strides__ returns None, then 
C-style contiguousness is assumed.   In fact, I like that idea so much 
that I just changed the interface.  Thanks for the suggestion.

>
> 3) A number of the attributes are optional, but will always be there 
> with SciPy arrays..(I assume) have you documented them anywhere?


No, they won't always be there for SciPy arrays (currently 4 of them 
are).  Only record-arrays will provide __array_descr__ for example and 
__array_offset__ is unnecessary for SciPy arrays.  I actually don't much 
like the __array_offset__  parameter myself, but Scott convinced me that 
it would could be useful for very complicated array classes. 

>
> 4) a wxWidgets wxPoint is defined as such:
>
> class WXDLLEXPORT wxPoint
> {
> public:
>     int x, y;
>
> etc.
>
> As wxWidgets is using "int", I"d like to be able to use "int". If I 
> define it as a 4 byte integer, I'm losing platform independence, 
> aren't I? Or can I use something like sizeof(int) ?


Ah, yes.. here is where we need some standard Python functions to help 
establish the array interface.   Sometimes you want to match a 
particular c-type, other times you want to match a particular bit 
width.  So, what do you do?  I had considered having an additional 
interface called ctypestr but decided against it for fear of creep.   I 
think in general we need to have in Python some constants to make this 
conversion easy

e.g.  ndarray.cint  (gives 'iX' on the correct platform). 

For now, I would check (__array_typestr__ == 'i%d' % 
array.array('i',[0]).itemsize)

But, on most platforms these days an int is 4 bytes, but the about would 
be just to make sure.

>
> 5) Why is: __array_data__ optional? Isn't that the whole point of this?

Because the object itself might expose the buffer interface.  We could 
make __array_data__ required and prefer that it return a buffer object.  
But, really all that is needed is something that exposes the buffer 
interface:  remember the difference between the buffer object and the 
buffer interface.   So, the correct consumer usage for grabbing the data is

data = getattr(obj, '__array_data__', obj)

Then, in C you use the Buffer *Protocol* to get a pointer to memory.  
For example, the function:

int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int 
*buffer_len)

Of course this approach has the 32-bit limit until we get this changed 
in Python. 

>
> 6) Should __array_offset__ be optional? I'd rather it were required, 
> but  default to zero. This way I have to check for it, then use it. 
> Also, I assume it is an integer number of bytes, is that right?


A consumer has to check for most of the optional stuff if they want to 
support all types of arrays.

Again a simple:

getattr(obj, '__array_offset__', 0)

works fine.

>
> 7) An alternative to the above: A __simple_ flag, that means the data 
> is a simple, C array of contiguous data of a single type. The most 
> common use, and it would be nice to just check that flag and not have 
> to take all other options into account.


I think if __array_strides__ returns None (and if an object doesn't 
expose it you can assume it) it is probably good enough.


-Travis


From oliphant at ee.byu.edu  Wed Apr  6 17:17:13 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 17:17:13 2005
Subject: [Numpy-discussion] masked arrays and NaNs
In-Reply-To: <425467BB.305@hawaii.edu>
References: <425467BB.305@hawaii.edu>
Message-ID: <42547B2B.4030700@ee.byu.edu>

Eric Firing wrote:

> Travis,
>
> I am whole-heartedly in favor of your efforts to end the 
> Numeric/numarray split by combining the best of both. I am encouraged 
> by the progress you have made, and by the depth and clarity of the 
> accompanying technical discussions.  Thank you!
>
> I am a long-time Matlab user in Physical Oceanography, and I have been 
> trying to find a practical way to phase out Matlab.  One key is 
> matplotlib, which is coming along wonderfully.  A second is the 
> availability of a Num* (or scipy.base) module that provides the 
> functionality and ease-of-use I presently get from Matlab.  This leads 
> to a request which I suspect and hope is consistent with your present 
> plans: efficient handling of NaNs and/or masked arrays.


I think both options will be available.    With the new error handling 
numarray showed nans will be allowed if you set the error mode correctly.

A verson of masked arrays will also be available (either in python or 
C).   

-Travis


From cookedm at physics.mcmaster.ca  Wed Apr  6 17:18:51 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 17:18:51 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca> (David M. Cooke's
 message of "Wed, 06 Apr 2005 17:41:50 -0400")
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <qnkis2zigo4.fsf@arbutus.physics.mcmaster.ca>

cookedm at physics.mcmaster.ca (David M. Cooke) writes:

> I've always found the Numeric setup.py to be not very user-friendly.
> So, I rewrote it. It's available as patch #1178095
> http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369
>
> Basically, all the editing you need to do is in customize.py, instead
> of touching setup.py. No more commenting out files for lapack_lite
> (just tell it to use the system LAPACK, and tell it where to find it).
>
> Also, you could now use GSL's cblas interface for dotblas. Useful if
> you've already taken the trouble to link that with an optimized
> Fortran BLAS.
>
> I didn't want to just through this into CVS without feedback first :-)
> If it looks good, this can go in Numeric 24.0.

I've checked it in.

Highlights:

* You only need to edit customize.py

* You don't need to edit if you're on OS X (>= 10.2): the vecLib
  framework for optimized BLAS and LAPACK will be used if found.

* If you have an incomplete ATLAS library (one without LAPACK), you
  can use it for BLAS (instead of blas_lite.c), and the included f2c'd
  routines for LAPACK will be used.

* Use whatever CBLAS interface you've got (ATLAS, GSL, the reference
  one available from netlib).

There's also an INSTALL file now, although it could some comments
about the 'python setup.py config' option.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From oliphant at ee.byu.edu  Wed Apr  6 18:14:33 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 18:14:33 2005
Subject: [Numpy-discussion] New array interface helper file
Message-ID: <4254890F.6080205@ee.byu.edu>

At http://numeric.scipy.org/array_interface.py

you will find the start of a set of helper functions for the array 
interface that can make it more easy to deal with.   It also documents 
the array interface with docstrings.  I tried to attach these to 
properties, but then I don't know how to "see" them from Python. 

This is the kind of thing I think should go into Python

If anybody would like to try their hand at converter functions to go 
back and forth between the struct module strings and the __array_descr__ 
string, make my day.

-Travis


From cookedm at physics.mcmaster.ca  Wed Apr  6 21:41:12 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 21:41:12 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <42546CC7.40408@ee.byu.edu> (Travis Oliphant's message of "Wed,
 06 Apr 2005 17:12:07 -0600")
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>
	<425458F7.9020307@ee.byu.edu>
	<qnkvf6zilv0.fsf@arbutus.physics.mcmaster.ca>
	<42546CC7.40408@ee.byu.edu>
Message-ID: <qnkd5t7i4hi.fsf@arbutus.physics.mcmaster.ca>

Travis Oliphant <oliphant at ee.byu.edu> writes:

> David M. Cooke wrote:
>>While I'm at it, I'm also thinking of writing a 'cblas_lite' for
>>dotblas. This would mean that dotblas would be enabled all the time.
>>You could use a C BLAS if you've got one (from ATLAS, say), or a
>>Fortran BLAS (like the cxml library on an Alpha running Tru64), or it
>>would use the existing blas_lite.c if you don't.
>>
> This is a good idea, but for more than just dotblas.

Hmm, like for what? dotblas is the only thing (in Numeric & numarray)
that uses the cblas_* functions. Unless you're thinking of using them
in more places, like ufuncs? cblas_lite would be thin shims with minimal
error-checking, probably not much use outside of dotblas.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From rkern at ucsd.edu  Wed Apr  6 21:47:30 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Wed Apr  6 21:47:30 2005
Subject: [Numpy-discussion] New array interface helper file
In-Reply-To: <4254890F.6080205@ee.byu.edu>
References: <4254890F.6080205@ee.byu.edu>
Message-ID: <4254BB2B.2000406@ucsd.edu>

Travis Oliphant wrote:
> 
> At http://numeric.scipy.org/array_interface.py
> 
> you will find the start of a set of helper functions for the array 
> interface that can make it more easy to deal with.   It also documents 
> the array interface with docstrings.  I tried to attach these to 
> properties, but then I don't know how to "see" them from Python.

Get it from the property object on the class itself.
E.g.

   expanded.__array_shape__.__doc__

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From oliphant at ee.byu.edu  Wed Apr  6 22:13:04 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 22:13:04 2005
Subject: [Numpy-discussion] New array interface helper file
In-Reply-To: <4254BB2B.2000406@ucsd.edu>
References: <4254890F.6080205@ee.byu.edu> <4254BB2B.2000406@ucsd.edu>
Message-ID: <4254C141.9040502@ee.byu.edu>

Robert Kern wrote:

> Travis Oliphant wrote:
>
>>
>> At http://numeric.scipy.org/array_interface.py
>>
>> you will find the start of a set of helper functions for the array 
>> interface that can make it more easy to deal with.   It also 
>> documents the array interface with docstrings.  I tried to attach 
>> these to properties, but then I don't know how to "see" them from 
>> Python.
>
>
> Get it from the property object on the class itself.
> E.g.
>
>   expanded.__array_shape__.__doc__
>
Thank you. 

-Travis


From Chris.Barker at noaa.gov  Wed Apr  6 23:36:36 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 23:36:36 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4254778A.1070100@ee.byu.edu>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca> <42546766.5060802@noaa.gov> <4254778A.1070100@ee.byu.edu>
Message-ID: <4254D4A8.5020007@noaa.gov>

Travis Oliphant wrote:

> You should account for the '<' or '>' that might be present in 
> __array_typestr__   (Numeric won't put it there, but scipy.base and 
> numarray will---since they can have byteswapped arrays internally). 

Good point, but a pain. Maybe they should be required, that way I don't 
have to first check for the presence of '<' or '>', then check if they 
have the right value.

> A more generic interface would handle multiple integer types if possible 

I'd like to support doubles as well...

> (but this is a good start...)

Right. I want to get _something_ working, before I try to make it universal!

> I think one idea here is that if __array_strides__ returns None, then 
> C-style contiguousness is assumed.   In fact, I like that idea so much 
> that I just changed the interface.  Thanks for the suggestion.

You're welcome. I like that too.

> No, they won't always be there for SciPy arrays (currently 4 of them 
> are).  Only record-arrays will provide __array_descr__ for example and 
> __array_offset__ is unnecessary for SciPy arrays.  I actually don't much 
> like the __array_offset__  parameter myself, but Scott convinced me that 
> it would could be useful for very complicated array classes.

I can see that it would, but then, we're stuck with checking for all 
these optional attributes. If I don't bother to check for it, one day, 
someone is going to pass a weird array in with an offset, and a strange 
bug will show up.

> e.g.  ndarray.cint  (gives 'iX' on the correct platform).
> For now, I would check (__array_typestr__ == 'i%d' % 
> array.array('i',[0]).itemsize)

I can see that that would work, but it does feel like a hack. BEsides, I 
might be doign this in C++ anyway, so it would probably be easier to use 
sizeof()


> But, on most platforms these days an int is 4 bytes, but the about would 
> be just to make sure.

Right. Making that assumption will jsut lead to weird bugs way don't he 
line. Of course, I wouldn't be surprised if wxWidgets and/or python 
makes that assumption in other places anyway!

>> 5) Why is: __array_data__ optional? Isn't that the whole point of this?
> 
> Because the object itself might expose the buffer interface.  We could 
> make __array_data__ required and prefer that it return a buffer object.  

Couldn't it be required, and return a reference to itself if that works?

Maybe I'm just being lazy, but it feels clunky and prone to errors to 
keep having to check if a attribute exists, then use it (or not).

> So, the correct consumer usage for grabbing the data is
> 
> data = getattr(obj, '__array_data__', obj)

Ah! I hadn't noticed the default parameter to getattr(). That makes it 
much easier. Is there an equivalent in C? It doesn't look like it to me, 
but I'm kind of a newbie with the C API.

> int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int 
> *buffer_len)

I'm starting to get this.

> Of course this approach has the 32-bit limit until we get this changed 
> in Python.

That's the least of my worries!

>> 6) Should __array_offset__ be optional? I'd rather it were required, 
>> but  default to zero. This way I have to check for it, then use it. 
>> Also, I assume it is an integer number of bytes, is that right?
> 
> A consumer has to check for most of the optional stuff if they want to 
> support all types of arrays.

That's not quite true. I'm happy to support only the simple types of 
arrays (contiguous, single type elements, zero offset(, but I have to 
check all that stuff to make sure that I have a simple array. The 
simplest arrays are the most common case, they should be as easy as 
possible to support.

> Again a simple:
> 
> getattr(obj, '__array_offset__', 0)
> 
> works fine.

not too bad.

Also, what if we find the need for another optional attribute later? Any 
older code won't check for it. Or maybe I'm being paranoid....

>> 7) An alternative to the above: A __simple_ flag, that means the data 
>> is a simple, C array of contiguous data of a single type. The most 
>> common use, and it would be nice to just check that flag and not have 
>> to take all other options into account.

  > I think if __array_strides__ returns None (and if an object doesn't
> expose it you can assume it) it is probably good enough.

That and __array_typestr__

Travis Oliphant wrote:
> 
> At http://numeric.scipy.org/array_interface.py
> 
> you will find the start of a set of helper functions for the array 
> interface that can make it more easy to deal with. 

Ah! this may well address my concerns. Good idea.

Thanks for all your work on this Travis.

By the way, a quote form Robin Dunn about this:

"Sweet!"

Thought you might appreciate that.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                      		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From konrad.hinsen at laposte.net  Wed Apr  6 23:55:02 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Wed Apr  6 23:55:02 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl> <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <2701da761c9f34fc1dc72fc97e87e788@laposte.net>

On 07.04.2005, at 00:43, David M. Cooke wrote:

> I like this! It's got namespace goodness all over it (last Python zen
> line in 'import this': Namespaces are one honking great idea -- let's
> do more of those!)

Sounds like a good principle!

> 1) arrays. Here, we want efficient computation of functions applied to
>    lots of elements. That's where the output arguments and special
>    methods (.reduce, .accumulate, and .outer) are useful

All that is accessible if the class gets passed the ufunc object.

> 2) polymorphic functions. Output arguments aren't useful here. The
>    special methods are useful for binary ufuncs only.

Fine, then they just call the ufunc. And the rare cases that need  
explicit code for each ufunc (my Derivatives, for example) can retrieve  
the name of the ufunc and dispatch on it.

Konrad.
--
------------------------------------------------------------------------ 
-------
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
------------------------------------------------------------------------ 
-------


From konrad.hinsen at laposte.net  Thu Apr  7 00:24:04 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Thu Apr  7 00:24:04 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <BAY103-F39227597DD89A7D5D14AFAA43D0@phx.gbl>
References: <BAY103-F39227597DD89A7D5D14AFAA43D0@phx.gbl>
Message-ID: <1986f60349f1d4d146c6ddb727362fd9@laposte.net>

On 06.04.2005, at 18:06, S?bastien de Menten wrote:

> Do you think it is possible to integrate a similar mechanism in array  
> functions (like searchsorted, argmax, ...).

That is less obvious. A generic interface for ufuncs is possible  
because of the uniform calling interface. Actually, there should  
perhaps be two ufunc application methods, for unary and for binary  
ufuncs. The other array functions each have a peculiar calling pattern.  
They can certainly be implemented through delegation to a method, but  
that would be one method per function. But I think that is inevitable  
if you want full flexibility.

> If we can register functions taking one array as argument within  
> scipy.base and let it dispatch those functions as ufunc, we could use  
> a similar strategy.
>
> For instance, let "sort" and "argmax" be registered as gfunc (general  
> functions on an array <> ufunc), then any class that would like to  
> overide any of them could do it too with the same trick Konrad exposed  
> here above.

Does that make sense in practice? Suppose you write a class that  
implements tables, i.e. arrays plus axis labels. You would want sort()  
to return an object of the same class, but argmax() to return a plain  
integer. The generic gfunc handler could do little else than dispatch  
on the name of the gfunc.

> Konrad, do you think it is tricky to have a prototype of your  
> suggestion (i.e. the modification does not need a full understanding  
> of Numeric and you can locate it approximately in the source code) ?

I haven't looked at the Numeric code in ages, but my guess is that the  
ufunc part should be easy to do, as it is just a modification of a  
generic handler that already exists.

Konrad.
--
------------------------------------------------------------------------ 
-------
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
------------------------------------------------------------------------ 
-------


From cookedm at physics.mcmaster.ca  Thu Apr  7 00:55:37 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Thu Apr  7 00:55:37 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4254D4A8.5020007@noaa.gov> (Chris Barker's message of "Wed, 06
 Apr 2005 23:35:20 -0700")
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>
	<42546766.5060802@noaa.gov> <4254778A.1070100@ee.byu.edu>
	<4254D4A8.5020007@noaa.gov>
Message-ID: <qnkvf6zggyt.fsf@arbutus.physics.mcmaster.ca>

"Chris Barker" <Chris.Barker at noaa.gov> writes:

> Travis Oliphant wrote:
>
>> You should account for the '<' or '>' that might be present in
>> __array_typestr__   (Numeric won't put it there, but scipy.base and
>> numarray will---since they can have byteswapped arrays internally).
>
> Good point, but a pain. Maybe they should be required, that way I
> don't have to first check for the presence of '<' or '>', then check
> if they have the right value.

I'll second this. Pulling out more Python Zen: Explicit is better than implicit.

>> A more generic interface would handle multiple integer types if
>> possible
>
> I'd like to support doubles as well...
>
>> (but this is a good start...)
>
> Right. I want to get _something_ working, before I try to make it universal!
>
>> I think one idea here is that if __array_strides__ returns None,
>> then C-style contiguousness is assumed.   In fact, I like that idea
>> so much that I just changed the interface.  Thanks for the
>> suggestion.
>
> You're welcome. I like that too.
>
>> No, they won't always be there for SciPy arrays (currently 4 of them
>> are).  Only record-arrays will provide __array_descr__ for example
>> and __array_offset__ is unnecessary for SciPy arrays.  I actually
>> don't much like the __array_offset__  parameter myself, but Scott
>> convinced me that it would could be useful for very complicated
>> array classes.
>
> I can see that it would, but then, we're stuck with checking for all
> these optional attributes. If I don't bother to check for it, one day,
> someone is going to pass a weird array in with an offset, and a
> strange bug will show up.

Here's a summary:

Attributes           required by            required
                     array-like object      to be checked
__array_shape__           yes                   yes
__array_typestr__         yes                   yes
__array_descr__           no                    no
__array_data__            no                    yes
__array_strides__         no                    yes
__array_mask__            no                    no?
__array_offset__          no                    yes

I'm assuming in "required to be checked" column a user of the array
that's interested in looking at all of the elements, so we have to
consider all possible situations where forgetting to consider an
attribute could lead to invalid memory accesses. __array_strides__ and
__array_offset__ in particular could be troublesome if forgotten.

The __array_mask__ element is difficult: for most applications, you
should check it, and raise an error if exists and is not None, unless
you can handle missing elements. It's certainly not required that all
users of an array object need to understand all array types!

Since we have to check a bunch anyways, I think that's a good enough
reason for having them to exist? There are suitable defaults defined
in the protocol document (__array_strides__ in particular) that make
it easy to add them in simple cases.

>> So, the correct consumer usage for grabbing the data is
>> data = getattr(obj, '__array_data__', obj)
>
> Ah! I hadn't noticed the default parameter to getattr(). That makes it
> much easier. Is there an equivalent in C? It doesn't look like it to
> me, but I'm kind of a newbie with the C API.

You'd want something like

adata = PyObject_GetAttrString(array_obj, "__attr_data__");
if (!adata) {
    /* error */
    PyErr_Clear();
    adata = array_obj;
}

>> int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int
>> *buffer_len)
>
> I'm starting to get this.
>
>> Of course this approach has the 32-bit limit until we get this
>> changed in Python.
>
> That's the least of my worries!
>
>>> 6) Should __array_offset__ be optional? I'd rather it were
>>> required, but  default to zero. This way I have to check for it,
>>> then use it. Also, I assume it is an integer number of bytes, is
>>> that right?
>> A consumer has to check for most of the optional stuff if they want
>> to support all types of arrays.
>
> That's not quite true. I'm happy to support only the simple types of
> arrays (contiguous, single type elements, zero offset(, but I have to
> check all that stuff to make sure that I have a simple array. The
> simplest arrays are the most common case, they should be as easy as
> possible to support.
>
>> Again a simple:
>> getattr(obj, '__array_offset__', 0)
>> works fine.
>
> not too bad.
>
> Also, what if we find the need for another optional attribute later?
> Any older code won't check for it. Or maybe I'm being paranoid....

This is a good point; all good protocols embed a version somewhere.
Not doing it now could lead to grief/pain later.

I'd suggest adding to __array_data__: If __array_data__ is None, then
the array is implementing a newer version of the interface, and you'd
either need to support that (maybe the new version uses
__array_data2__ or something), or use the sequence protocol on the
original object. The sequence protocol should definitely be safe all
the time, whereas the buffer protocol may not. (Put it this way: I
understand the sequence protocol well, but not the buffer one :-)

That would also be a good argument for it existing, I think.

Alternatively, we could add an __array_version__ attribute (required
to exist, required to check) which is set to 1 for this protocol.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From magnus at hetland.org  Thu Apr  7 01:05:03 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr  7 01:05:03 2005
Subject: [Numpy-discussion] Possible example application of the array interface
In-Reply-To: <bbcd77d00504061137318773ed@mail.gmail.com>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> <bbcd77d00504061137318773ed@mail.gmail.com>
Message-ID: <20050407080429.GB20252@idi.ntnu.no>

Bruce Southey <bsouthey at gmail.com>:
>
> Hi,
> I don't see that it is feasible to link R and numerical python in this
> way. As you point out, R objects (R is an object orientated language)
> uses a lot of meta-data.  Then there is the IEEE stuff (NaN etc) that
> would also need to be handled in numerical python.

Too bad. (I seem to recall seing somehthing about numpy
conversion on the Web pages of RPy, though; perhaps, if one can stand
a bit of copying, the two can be used together after all?)

> You probably could get RPy or RSPython to use numerical python rather
> than just baisc Python.
> 
> What statistical functions would you want in numerical python? 

I think I'd want most of the standard, parametrized probability
distributions (as well as automatic estimation from data, perhaps) and
a handful of common statistical tests (t-test, z-test, Fishcher,
chi-squared, what-have-you). Perhaps some support for factorial
experiments (not sure if R has anything specific there, though).

And another thing: R seems to have vary fancy (although difficult to
use) plotting capabilities... Until SciPy catches up (it hasn't yet,
has it? ;) that might be a reason for using R(Py) as well, I guess.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From cookedm at physics.mcmaster.ca  Thu Apr  7 01:08:11 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Thu Apr  7 01:08:11 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <2701da761c9f34fc1dc72fc97e87e788@laposte.net> (konrad hinsen's
 message of "Thu, 7 Apr 2005 08:53:06 +0200")
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
	<8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net>
	<qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca>
	<2701da761c9f34fc1dc72fc97e87e788@laposte.net>
Message-ID: <qnku0mjgge0.fsf@arbutus.physics.mcmaster.ca>

konrad.hinsen at laposte.net writes:

> On 07.04.2005, at 00:43, David M. Cooke wrote:
>
>> I like this! It's got namespace goodness all over it (last Python zen
>> line in 'import this': Namespaces are one honking great idea -- let's
>> do more of those!)
>
> Sounds like a good principle!
>
>> 1) arrays. Here, we want efficient computation of functions applied to
>>    lots of elements. That's where the output arguments and special
>>    methods (.reduce, .accumulate, and .outer) are useful
>
> All that is accessible if the class gets passed the ufunc object.
>
>> 2) polymorphic functions. Output arguments aren't useful here. The
>>    special methods are useful for binary ufuncs only.
>
> Fine, then they just call the ufunc. And the rare cases that need
> explicit code for each ufunc (my Derivatives, for example) can
> retrieve  the name of the ufunc and dispatch on it.

Hmm, I had misread your previous code. Here it is again, made more
specific, and I'll assume this function lives in the ndarray package
(as there is more than one package that defines ufuncs)

def cos(obj):
    if ndarray.isarray(obj):
        return ndarray.array_cos(obj)
    else:
        try:
            return obj.__ufunc__(cos)
        except AttributeError:
            if ndarray.is_array_like(obj):
                a = ndarray.array(obj)
                return ndarray.array_cos(a)
            else:
                raise ValueError

The thing is obj.__ufunc__ must understand about the *particular*
object cos: the ndarray one. I was thinking more along the lines of
obj.__ufunc__('cos'), where the name is passed instead.

For binary ufuncs, you could use (with arguments obj1 and obj2),
obj1.__ufunc__('add', obj2)

Output argument (obj3): obj1.__ufunc__('add', obj2, obj3)
Special methods:
    obj1.__ufunc__('add.reduce')
    obj1.__ufunc__('add.accumulate')
    obj1.__ufunc__('add.outer', obj2)

Basically, special methods are just another ufunc. This suggests that
add.outer should optionally take an output argument...

Alternatively, __ufunc__ could be an object of implemented ufuncs:

obj.__ufunc__.cos()
obj1.__ufunc__.add(obj2)
obj1.__ufunc__.add(obj2, obj3)
obj1.__ufunc__.add.reduce()
obj1.__ufunc__.add.accumulate()
obj1.__ufunc__.add.outer(obj2)

It depends where you want to do the dispatch. I think this version is
better: it's easier to discover what __ufunc__'s are supported with
generic tools (IPython tab completion, pydoc, etc.).

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From konrad.hinsen at laposte.net  Thu Apr  7 01:34:37 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Thu Apr  7 01:34:37 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <qnku0mjgge0.fsf@arbutus.physics.mcmaster.ca>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl> <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> <qnku0mjgge0.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <9d8cfa0b284c9b9be787970030e6b3de@laposte.net>

On Apr 7, 2005, at 10:06, David M. Cooke wrote:

> Hmm, I had misread your previous code. Here it is again, made more
> specific, and I'll assume this function lives in the ndarray package
> (as there is more than one package that defines ufuncs)

At the moment, there is one in Numeric and one in numarray. The Python 
API of both is nearly or fully identical.

> The thing is obj.__ufunc__ must understand about the *particular*
> object cos: the ndarray one. I was thinking more along the lines of

No, it must only know the interface. In most cases, it would do 
something like

	class MyArray:
		def __ufunc__(self, ufunc):
			return MyArray(apply(ufunc, self.data))

> obj.__ufunc__('cos'), where the name is passed instead.

That's also an interesting option. It would require the implementing 
class to choose an appropriate function from an appropriate module. 
Alternatively, it would work if ufuncs were also accessible as methods 
on array objects.

> For binary ufuncs, you could use (with arguments obj1 and obj2),
> obj1.__ufunc__('add', obj2)

Except that it would perhaps be better to have a different method, as 
otherwise nearly every implementation would have to start with a 
condition test to distinguish unary from binary ufuncs.

> Output argument (obj3): obj1.__ufunc__('add', obj2, obj3)
> Special methods:
>     obj1.__ufunc__('add.reduce')
>     obj1.__ufunc__('add.accumulate')
>     obj1.__ufunc__('add.outer', obj2)
>
> Basically, special methods are just another ufunc. This suggests that
> add.outer should optionally take an output argument...

But they are not just another ufunc, because a standard unary ufunc 
always returns an array of the same shape as its argument.

I'd probably prefer a few explicit methods:

	object.__unary__(cos)
	object.__binary__(add, other)
	object.__binary_reduce__(add)

etc.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Laboratoire L?on Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
---------------------------------------------------------------------


From Sebastien.deMentendeHorne at electrabel.com  Thu Apr  7 02:26:28 2005
From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com)
Date: Thu Apr  7 02:26:28 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
    scipy.base or Numeric3)
Message-ID: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be>

> 
> On Apr 7, 2005, at 10:06, David M. Cooke wrote:
> 
> > Hmm, I had misread your previous code. Here it is again, made more
> > specific, and I'll assume this function lives in the ndarray package
> > (as there is more than one package that defines ufuncs)
> 
> At the moment, there is one in Numeric and one in numarray. 
> The Python 
> API of both is nearly or fully identical.
> 
> > The thing is obj.__ufunc__ must understand about the *particular*
> > object cos: the ndarray one. I was thinking more along the lines of
> 
> No, it must only know the interface. In most cases, it would do 
> something like
> 
> 	class MyArray:
> 		def __ufunc__(self, ufunc):
> 			return MyArray(apply(ufunc, self.data))

Exactly ! I see this as a very common use (masked arrays and all the other examples could live with that).
Or more precisely (just to be explicity as the previous MyArray example is the simplest (purest) one),
 	class MyArray:
 		def __ufunc__(self, ufunc):
			metadata= process(self.metadata, ufunc)
			data = apply(ufunc, self.data)
 			return MyArray(data, metadata)
Or variations on this same theme.

BTW, looking at Numeric3, the presence of a __mask_array__ in the array protocol looks like we want to add a specific case of "augmented array" to the core protocol. Hmmm, rather prefer to build a more generic mechanism as well as a clean interface for interacting with "augmented array".

> 
> > obj.__ufunc__('cos'), where the name is passed instead.
> 
> That's also an interesting option. It would require the implementing 
> class to choose an appropriate function from an appropriate module. 
> Alternatively, it would work if ufuncs were also accessible 
> as methods 
> on array objects.
> 

Why not have the ability to ask the name of an ufunc to be able to dispatch on it ?

> > For binary ufuncs, you could use (with arguments obj1 and obj2),
> > obj1.__ufunc__('add', obj2)
> 
> Except that it would perhaps be better to have a different method, as 
> otherwise nearly every implementation would have to start with a 
> condition test to distinguish unary from binary ufuncs.
> 
> > Output argument (obj3): obj1.__ufunc__('add', obj2, obj3)
> > Special methods:
> >     obj1.__ufunc__('add.reduce')
> >     obj1.__ufunc__('add.accumulate')
> >     obj1.__ufunc__('add.outer', obj2)
> >
> > Basically, special methods are just another ufunc. This 
> suggests that
> > add.outer should optionally take an output argument...
> 
> But they are not just another ufunc, because a standard unary ufunc 
> always returns an array of the same shape as its argument.
> 
> I'd probably prefer a few explicit methods:
> 
> 	object.__unary__(cos)
> 	object.__binary__(add, other)
> 	object.__binary_reduce__(add)
> 

What about :

object.__unary__(cos, mode = "reduce")
object.__binary__(cos, other, mode = "reduce")

or

object.__unary__(cos.reduce)
object.__binary__(cos.apply, other) or object.__binary__(cos.__call__, other)
with the ability to ask to the first argument its type (with cos.mode or cos.reduce.mode ...)

However, for binary operations, how it the call dispatched if one of the operand is of a type while the other is another type ? This problem is related to multimethods http://www.artima.com/weblogs/viewpost.jsp?thread=101605


=======================================================
This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.

http://www.electrabel.be/homepage/general/disclaimer_EN.asp
=======================================================


From konrad.hinsen at laposte.net  Thu Apr  7 02:42:07 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Thu Apr  7 02:42:07 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be>
References: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be>
Message-ID: <af5840d7d07f41d72f48b6f8025f3654@laposte.net>

On Apr 7, 2005, at 11:25, Sebastien.deMentendeHorne at electrabel.com 
wrote:

> Why not have the ability to ask the name of an ufunc to be able to 
> dispatch on it ?

That's already possible.

> What about :
>
> object.__unary__(cos, mode = "reduce")
> object.__binary__(cos, other, mode = "reduce")

What does "reduce" mode mean for cos?
What does a binary ufunc in reduce mode do with its second argument?

> However, for binary operations, how it the call dispatched if one of 
> the operand is of a type while the other is another type ? This 
> problem is related to multimethods 
> http://www.artima.com/weblogs/viewpost.jsp?thread=101605

No need to be innovative: Python always dispatches on the first 
argument, and everybody is familiar with that approach even though it 
isn't perfect. If Python 3000 has multimethods, we can still adapt.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Laboratoire L?on Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
---------------------------------------------------------------------


From Sebastien.deMentendeHorne at electrabel.com  Thu Apr  7 02:54:57 2005
From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com)
Date: Thu Apr  7 02:54:57 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
    scipy.base or Numeric3)
Message-ID: <6E48F3D185CF644788F55917A0D50A9314A9AB@seebex02.eib.electrabel.be>

> > Why not have the ability to ask the name of an ufunc to be able to 
> > dispatch on it ?
> 
> That's already possible.
> 
> > What about :
> >
> > object.__unary__(cos, mode = "reduce")
> > object.__binary__(cos, other, mode = "reduce")
> 
> What does "reduce" mode mean for cos?
> What does a binary ufunc in reduce mode do with its second argument?

raise a ValueError :-)
It was an example of a way to pass argument, the focus was on cos.reduce or "cos.reduce" or cos, "reduce".

> > However, for binary operations, how it the call dispatched 
> if one of 
> > the operand is of a type while the other is another type ? This 
> > problem is related to multimethods 
> > http://www.artima.com/weblogs/viewpost.jsp?thread=101605
> 
> No need to be innovative: Python always dispatches on the first 
> argument, and everybody is familiar with that approach even though it 
> isn't perfect. If Python 3000 has multimethods, we can still adapt.

The problematic is related to multimethods, the implementation should not be specially related.

In an a call like object.__binary__(add, other), if other is not of the same type of object, the latter could throw an exception as ImplementationError to give the hand to other.__binary__(add, binary) or to other.__binary__(radd, binary) or similar (i.e. those expressions may not make sense but the
idea is to have a convention to give the hand to the other operand, python does this already when one overloads an operator like __add__ (__radd__)).
So if we can keep this same protocol for binary ufunc, that would be great.

Otherwise, I think it is not that a big deal.

Sebastien


=======================================================
This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.

http://www.electrabel.be/homepage/general/disclaimer_EN.asp
=======================================================


From xscottg at yahoo.com  Thu Apr  7 04:35:49 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 04:35:49 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4254778A.1070100@ee.byu.edu>
Message-ID: <20050407113421.49329.qmail@web50202.mail.yahoo.com>

--- Travis Oliphant <oliphant at ee.byu.edu> wrote:
> >
> > 2) As __array_strides__ is optional, I'd kind of like to have a 
> > __contiguous__ flag that I could just check, rather than checking for 
> > the existence of strides, then calculating what the strides should be, 
> > then checking them.
> 
> 
> I don't want to add too much.  The other approach is to establish a set 
> of helper functions in Python to check this sort of thing:   Thus, if 
> you can't handle a general array you check:
> 
> ndarray.iscontiguous(obj) 
> 
> where obj exports the array interface.
> 
> But, it could really go either way.   What do others think?
> 

I think this should definitely be done in the helper functions.  Having
extra attributes encode redundant information is a recipe for trouble.


Cheers,
    -Scott


From xscottg at yahoo.com  Thu Apr  7 04:43:37 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 04:43:37 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4254D4A8.5020007@noaa.gov>
Message-ID: <20050407114157.23887.qmail@web50209.mail.yahoo.com>

--- Chris Barker <Chris.Barker at noaa.gov> wrote:
> 
> I can see that it would, but then, we're stuck with checking for all 
> these optional attributes. If I don't bother to check for it, one day, 
> someone is going to pass a weird array in with an offset, and a strange 
> bug will show up.
> 

Everyone seems to think that an offset is so weird.  I haven't looked at
the internals of Numeric/scipy.base in a while so maybe it doesn't apply
there.  However, if you subscript an array and return a view to the data,
you need an offset or you need to create a new buffer that encodes the
offset for you.

    A = reshape(arange(9), (3,3))

        0, 1, 2
        3, 4, 5
        6, 7, 8

    B = A[2]    # create a view into A

        6, 7, 8 # Shared with the data above


Unless you're going to create a new buffer (which I guess is what Numeric
is doing), the offset for B would be 6 in this very simple case.  I think
specifying the offset is much more elegant than creating a new buffer
object with a hidden offset that refers to the old buffer object.

I guess all I'm saying is that I wouldn't assume the offset is zero...


> 
> Couldn't it be required, and return a reference to itself if that works?
> 
> Maybe I'm just being lazy, but it feels clunky and prone to errors to 
> keep having to check if a attribute exists, then use it (or not).
> 

The problem is that you aren't being lazy enough.  :-)

The fact that a lot of these attributes are optional should be hidden in
helper functions like those in Travis's array_interface.py module, or a
C/C++ include file (with inline functions).

In a short while, you shouldn't have to check any __array_metadata__
attributes directly.  There should even be a helper function for getting
the array elements.


It wouldn't be a horrible mistake to have all the attributes be mandatory,
but it doesn't get array consumes any benefit that they can't get from a
well written helper library, and it does add some burden to array
producers. 


Cheers,
    -Scott


From mrmaple at gmail.com  Thu Apr  7 04:44:27 2005
From: mrmaple at gmail.com (James Carroll)
Date: Thu Apr  7 04:44:27 2005
Subject: [Numpy-discussion] Re: Questions about the array interface.
In-Reply-To: <42546766.5060802@noaa.gov>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>
	 <42546766.5060802@noaa.gov>
Message-ID: <b273c741050407044366ded656@mail.gmail.com>

Hi Chris, Travis, ...

Great conversation you've started.  I have two questions at the
moment...   I do love the idea that an abstraction can bring the
different but similar num* worlds together.

Which sourceforge CVS repository is the interface (and an
implementation) show up on first?  My guess is numpy/numeric3
I see Travis has been updating it while I sleep.

>      def DrawPointList(self, points, pens=None):
>         ...
>         # some checking code on the pens)
>          ...
>          if (hasattr(points,'__array_shape__') and
>                  hasattr(points,'__array_typestr__') and
>                  len(points.__array_shape__) == 2 and
>                  points.__array_shape__[1] == 2 and
>                  points.__array_typestr__ == 'i4' and
>                  ): # this means we have a compliant array
>             # return the array protocol version
>             return self._DrawPointArray(points.__array_data__, pens,[])
>                     #This needs to be written now!

This means that whenever you have some complex multivalued
multidementional structure with the data you want to plot, you have to
reshape it into the above 'compliant' array before passing it on.  I'm
a newbie, but is this reshape something where the data has to be
copied and take up memory twice?  If not, then great, you would
painlessly reshape into something that had a different set of strides
that just accessed the data that complied in the big blob of data.  If
the reshape is expensive, then maybe we need the array abstraction,
and then a second 'thing' that described which parts of the array to
use for the sequence of 2-tuples to use for plotting the x,y s of a
scatter plot. (or whatever)

I do think we can accept more than just i4 for a datatype.  Especially
since a last-minute cast to i4 in inexpensive for almost every data
type.

>          else:
>              #return the generic python sequence version
>              return self._DrawPointList(points, pens, [])
> 
> Then we'll need a function (in C++):
>   _DrawPointArray(points.__array_data__, pens,[])

Looks great.

-Jim


From xscottg at yahoo.com  Thu Apr  7 04:52:11 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 04:52:11 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <qnkvf6zggyt.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <20050407115141.96479.qmail@web50204.mail.yahoo.com>

--- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
> >
> > Good point, but a pain. Maybe they should be required, that way I
> > don't have to first check for the presence of '<' or '>', then check
> > if they have the right value.
> 
> I'll second this. Pulling out more Python Zen: Explicit is better than
> implicit.
> 

I'll third.


> 
> This is a good point; all good protocols embed a version somewhere.
> Not doing it now could lead to grief/pain later.
> 
> I'd suggest adding to __array_data__: If __array_data__ is None, then
> the array is implementing a newer version of the interface, and you'd
> either need to support that (maybe the new version uses
> __array_data2__ or something), or use the sequence protocol on the
> original object. The sequence protocol should definitely be safe all
> the time, whereas the buffer protocol may not. (Put it this way: I
> understand the sequence protocol well, but not the buffer one :-)
> 
> That would also be a good argument for it existing, I think.
> 
> Alternatively, we could add an __array_version__ attribute (required
> to exist, required to check) which is set to 1 for this protocol.
> 

I like this, although I think having __array_data__ return None is
confusing.  I think __array_version__ (or __array_protocol__?) is the
better choice.  How about have it optional and default to 1?  If it's
present and greater than 1 then it means there is something new going on...


Cheers,
    -Scott


From cjw at sympatico.ca  Thu Apr  7 05:57:36 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Thu Apr  7 05:57:36 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <9d8cfa0b284c9b9be787970030e6b3de@laposte.net>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl> <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> <qnku0mjgge0.fsf@arbutus.physics.mcmaster.ca> <9d8cfa0b284c9b9be787970030e6b3de@laposte.net>
Message-ID: <42552DD2.2040200@sympatico.ca>

konrad.hinsen at laposte.net wrote:

> On Apr 7, 2005, at 10:06, David M. Cooke wrote:
>
>> Hmm, I had misread your previous code. Here it is again, made more
>> specific, and I'll assume this function lives in the ndarray package
>> (as there is more than one package that defines ufuncs)
>
>
> At the moment, there is one in Numeric and one in numarray. The Python 
> API of both is nearly or fully identical.
>
>> The thing is obj.__ufunc__ must understand about the *particular*
>> object cos: the ndarray one. I was thinking more along the lines of
>
>
> No, it must only know the interface. In most cases, it would do 
> something like
>
>     class MyArray:
>         def __ufunc__(self, ufunc):
>             return MyArray(apply(ufunc, self.data))
>
>> obj.__ufunc__('cos'), where the name is passed instead.
>
>
> That's also an interesting option. It would require the implementing 
> class to choose an appropriate function from an appropriate module. 
> Alternatively, it would work if ufuncs were also accessible as methods 
> on array objects.
>
Yes, perhaps with a slightly different name (say Cos vs cos) to 
distinguish between methods and functions.  Since they don't require 
arguments, the methods would not require parentheses.

Colin W.


From bsouthey at gmail.com  Thu Apr  7 06:45:32 2005
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu Apr  7 06:45:32 2005
Subject: [Numpy-discussion] Possible example application of the array interface
In-Reply-To: <20050407080429.GB20252@idi.ntnu.no>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
	 <bbcd77d00504061137318773ed@mail.gmail.com>
	 <20050407080429.GB20252@idi.ntnu.no>
Message-ID: <bbcd77d005040706443865787e@mail.gmail.com>

Hi,
> > What statistical functions would you want in numerical python?
> 
> I think I'd want most of the standard, parametrized probability
> distributions (as well as automatic estimation from data, perhaps) and
> a handful of common statistical tests (t-test, z-test, Fishcher,
> chi-squared, what-have-you). Perhaps some support for factorial
> experiments (not sure if R has anything specific there, though).

Most of this is in SciPy already based Gary's code.  I have not looked
at it in great detail because is doesn't meet my immediate needs. One
of my major needs is to be able to handle missing values. Perhaps one
day it will handle that or I will get the time to do so.

I have been working on code with another person to do general linear
models  (along the lines of R's lm function and SAS's glm procedure)
that would address factorial and other experimental designs.  R just
doesn't do enough for me in this aspect.

Two real problems are data storage and model declaration. The mixed
model component is really only for my area and I want to use symmetric
matrices as the requirements of these models grow really fast.

I would be willing to try to address and contribute to the statistical
needs if people are interested because I prefer a 'pure python'
approach. The other way is to directly call some of the R functions
from Python since the main core of these functions are written in C
and Fortran.

> And another thing: R seems to have vary fancy (although difficult to
> use) plotting capabilities... Until SciPy catches up (it hasn't yet,
> has it? ;) that might be a reason for using R(Py) as well, I guess.
> 
> --
> Magnus Lie Hetland                    Fall seven times, stand up eight
> http://hetland.org                                  [Japanese proverb]
> 

Yeah, S/S+/R provides some nice graphs until you need to change from
the defaults.

Regards
Bruce


From Gilles.Simond at obs.unige.ch  Thu Apr  7 07:55:08 2005
From: Gilles.Simond at obs.unige.ch (SIMOND Gilles)
Date: Thu Apr  7 07:55:08 2005
Subject: [Numpy-discussion] Quite curious behaviour in Numeric
Message-ID: <1112885601.15142.53.camel@obssf5>


 2.6.8-1-686-smp (dilinger at toaster.hq.voxel.net) 
  (gcc version 3.3.4 (Debian 1:3.3.4-9)) 
  #1 SMP Sat Aug 28 12:51:43 EDT 2004:

  and    python2.3

    >>> a=Numeric.ones((2,3),'i')
    >>> b=Numeric.sum(a)+1
    >>> a[1]=b+1
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    TypeError: Array can not be safely cast to required type
    >>> a.itemsize()
    4
    >>> b.itemsize()
    4
    >>> a.typecode()
    'i'


and e following works

    >>> a=Numeric.ones((2,3))
    >>> b=Numeric.sum(a)+1
    >>> a[1]=b+1
    >>> a.itemsize()
    4
    >>> b.itemsize()
    4
    >>> a.typecode()
    'l'
    
    >>> type(1)
    <type 'int'>

    >>> Numeric.__version__
    '23.6'

  It  seems  that  itemsize()  does  not  return the correct value which
  should  be  8  for  'l'  type array. This is quite annoying since this
  function  is  the  only  way to know actual format of the array.


 Gilles Simond


From rkern at ucsd.edu  Thu Apr  7 08:17:44 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Thu Apr  7 08:17:44 2005
Subject: [Numpy-discussion] Possible example application of the array
 interface
In-Reply-To: <20050407080429.GB20252@idi.ntnu.no>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> <bbcd77d00504061137318773ed@mail.gmail.com> <20050407080429.GB20252@idi.ntnu.no>
Message-ID: <42554EC6.9090807@ucsd.edu>

Magnus Lie Hetland wrote:
> Bruce Southey <bsouthey at gmail.com>:

>>What statistical functions would you want in numerical python? 
> 
> 
> I think I'd want most of the standard, parametrized probability
> distributions (as well as automatic estimation from data, perhaps) and
> a handful of common statistical tests (t-test, z-test, Fishcher,
> chi-squared, what-have-you). Perhaps some support for factorial
> experiments (not sure if R has anything specific there, though).

Except for factorial designs, scipy.stats has all of that.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From oliphant at ee.byu.edu  Thu Apr  7 08:23:13 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Thu Apr  7 08:23:13 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050407115141.96479.qmail@web50204.mail.yahoo.com>
References: <20050407115141.96479.qmail@web50204.mail.yahoo.com>
Message-ID: <4255502D.6060306@ee.byu.edu>

Scott Gilbert wrote:

>--- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
>  
>
>>>Good point, but a pain. Maybe they should be required, that way I
>>>don't have to first check for the presence of '<' or '>', then check
>>>if they have the right value.
>>>      
>>>
>>I'll second this. Pulling out more Python Zen: Explicit is better than
>>implicit.
>>
>>    
>>
>
>I'll third.
>  
>

O.K.  It's done....


From curzio.basso at unibas.ch  Thu Apr  7 09:58:40 2005
From: curzio.basso at unibas.ch (Curzio Basso)
Date: Thu Apr  7 09:58:40 2005
Subject: [Numpy-discussion] profile reveals calls to astype()
Message-ID: <4255664F.2070107@unibas.ch>

Hi all,

I have a problem trying to profile a program using numarray, maybe someone with more experience can 
give me a hint...

basically, the program I am profiling has a function like this:

foo():
   # some code
   # a call to astype()
   for i in xrange(N):
     # some other code and NO explicit call to astype()

the problem is that when I print the 'callees' of foo(), astype() gets listed with an occurrence of 
N+1, as if it was called inside the loop.
So now the first doubt I have is that astype() gets listed because called from some function called 
by foo(), even if this should not happen. Here is the list of numarray functions called in foo()

Function                 called...
                           generic.py:651(getshape)(14)    0.070
                           generic.py:918(reshape)(2)    0.000
                           generic.py:1013(where)(2)    0.050
                           generic.py:1069(concatenate)(2)    4.270
                           morphology.py:150(binary_erosion)(2)    0.070
                           numarraycore.py:698(__del__)(120032)    3.240
                           numarraycore.py:817(astype)(12002)   37.290
                           numarraycore.py:857(is_c_array)(36000)   10.450
                           numarraycore.py:878(type)(4)    0.000
                           numarraycore.py:964(__mul__)(12)    0.340
                           numarraycore.py:981(__div__)(8)    0.010
                           numarraycore.py:1068(__pow__)(8)    0.000
                           numarraycore.py:1180(__imul__)(12000)    0.930
                           numarraycore.py:1250(__eq__)(2)    0.080
                           numarraycore.py:1400(zeros)(54)    0.060
                           numarraycore.py:1409(ones)(8)    0.020

The second thing I can think of is that astype() is implicitly called by some conversion. Can this be?

curzio


From jmiller at stsci.edu  Thu Apr  7 10:51:38 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr  7 10:51:38 2005
Subject: [Numpy-discussion] profile reveals calls to astype()
In-Reply-To: <4255664F.2070107@unibas.ch>
References: <4255664F.2070107@unibas.ch>
Message-ID: <1112896207.2437.34.camel@halloween.stsci.edu>

astype() is used in a bunch of places, including the C-API,  so it's
hard to guess how it's getting called with the information here.  In
general,  astype() gets called to "match up types" based on a particular
parameterization of a function call,  i.e.  the c-code underlying some
function call needs a different type than was passed in so astype() is
used to convert an array to a workable type.

One possibility for debugging this might be to drop N to something
reasonable, like say 2,  and then run under pdb with a breakpoint set on
astype().    Something like this is what I have in mind;  it may not be
exactly right but with fiddling this approach might work:

>>> from yourmodule import newfoo  # you redefined foo to accept N as a parameter
>>> import pdb
>>> pdb.run("newfoo(N=2)")
(pdb) s  # step along a little to get into newfoo()
... step output
(pdb) import numarray.numarraycore as nc
(pdb) break nc.astype
(pdb) c
... breakpoint output
(pdb) where
... function traceback showing where astype() got called from
(pdb) c
... breakpoint output
(pdb) where
... more function traceback, eventually you should find it...
...

Regards,
Todd

On Thu, 2005-04-07 at 12:56, Curzio Basso wrote:
> Hi all,
> 
> I have a problem trying to profile a program using numarray, maybe someone with more experience can 
> give me a hint...
> 
> basically, the program I am profiling has a function like this:
> 
> foo():
>    # some code
>    # a call to astype()
>    for i in xrange(N):
>      # some other code and NO explicit call to astype()
> 
> the problem is that when I print the 'callees' of foo(), astype() gets listed with an occurrence of 
> N+1, as if it was called inside the loop.
> So now the first doubt I have is that astype() gets listed because called from some function called 
> by foo(), even if this should not happen. Here is the list of numarray functions called in foo()
> 
> Function                 called...
>                            generic.py:651(getshape)(14)    0.070
>                            generic.py:918(reshape)(2)    0.000
>                            generic.py:1013(where)(2)    0.050
>                            generic.py:1069(concatenate)(2)    4.270
>                            morphology.py:150(binary_erosion)(2)    0.070
>                            numarraycore.py:698(__del__)(120032)    3.240
>                            numarraycore.py:817(astype)(12002)   37.290
>                            numarraycore.py:857(is_c_array)(36000)   10.450
>                            numarraycore.py:878(type)(4)    0.000
>                            numarraycore.py:964(__mul__)(12)    0.340
>                            numarraycore.py:981(__div__)(8)    0.010
>                            numarraycore.py:1068(__pow__)(8)    0.000
>                            numarraycore.py:1180(__imul__)(12000)    0.930
>                            numarraycore.py:1250(__eq__)(2)    0.080
>                            numarraycore.py:1400(zeros)(54)    0.060
>                            numarraycore.py:1409(ones)(8)    0.020
> 
> The second thing I can think of is that astype() is implicitly called by some conversion. Can this be?
> 
> curzio
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-- 


From Chris.Barker at noaa.gov  Thu Apr  7 11:38:43 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Thu Apr  7 11:38:43 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050407114157.23887.qmail@web50209.mail.yahoo.com>
References: <20050407114157.23887.qmail@web50209.mail.yahoo.com>
Message-ID: <42557DE3.3010804@noaa.gov>

Scott Gilbert wrote:
 > I think __array_version__ (or __array_protocol__?) is the
 > better choice.  How about have it optional and default to 1?  If it's
 > present and greater than 1 then it means there is something new going 
on...

Again, I'm uncomfortable with something that I have to check being 
optional. If it is, we're encouraging people to not check it, and that' 
a recipe for bugs later on down the road.

 > Everyone seems to think that an offset is so weird.  I haven't looked at
 > the internals of Numeric/scipy.base in a while so maybe it doesn't apply
 > there.  However, if you subscript an array and return a view to the data,
 > you need an offset or you need to create a new buffer that encodes the
 > offset for you.

 > I guess all I'm saying is that I wouldn't assume the offset is zero...

Good point. All the more reason to have the offset be mandatory.

 > The fact that a lot of these attributes are optional should be hidden in
 > helper functions like those in Travis's array_interface.py module, or a
 > C/C++ include file (with inline functions).

Yes, if there is a C/C++ version of all these helper functions, I'll be 
a lot happier. And you're right, the same information should not be 
encoded in two places, so my "iscontiguous" attribute should be a helper 
function or maybe a method.

 > In a short while, you shouldn't have to check any __array_metadata__
 > attributes directly.  There should even be a helper function for getting
 > the array elements.

Cool. How would that work? A C++ iterator? I"m thinking not, as this is 
all C, no?

 > It wouldn't be a horrible mistake to have all the attributes be 
mandatory,
 > but it doesn't get array consumes any benefit that they can't get from a
 > well written helper library, and it does add some burden to array
 > producers.

Hardly any. I'm assuming that there will be a base_array class that can 
be used as a base class or mixin, so it wouldn't be any work at all to 
have a full set of attributes with defaults. It would take up a little 
bit of memory. I'm assuming that the whole point of this is to support 
large datasets, but maybe that isn't a valid assumption, After all, 
small array support has turned out to be very important for Numeric.

As a rule of thumb, I think there will be consumers of arrays that 
producers, so I'd rather make it easy on the consumers that the 
producers, if we need to make such a trade off. Maybe I'm biased, 
because I'm a consumer.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From Chris.Barker at noaa.gov  Thu Apr  7 12:20:05 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Thu Apr  7 12:20:05 2005
Subject: [Numpy-discussion] Re: Questions about the array interface.
In-Reply-To: <b273c741050407044366ded656@mail.gmail.com>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>	 <42546766.5060802@noaa.gov> <b273c741050407044366ded656@mail.gmail.com>
Message-ID: <42558796.4070607@noaa.gov>


James Carroll wrote:

>>     def DrawPointList(self, points, pens=None):
>>        ...
>>        # some checking code on the pens)
>>         ...
>>         if (hasattr(points,'__array_shape__') and
>>                 hasattr(points,'__array_typestr__') and
>>                 len(points.__array_shape__) == 2 and
>>                 points.__array_shape__[1] == 2 and
>>                 points.__array_typestr__ == 'i4' and
>>                 ): # this means we have a compliant array
>>            # return the array protocol version
>>            return self._DrawPointArray(points.__array_data__, pens,[])
>>                    #This needs to be written now!
> 
> 
> This means that whenever you have some complex multivalued
> multidementional structure with the data you want to plot, you have to
> reshape it into the above 'compliant' array before passing it on.  I'm
> a newbie, but is this reshape something where the data has to be
> copied and take up memory twice?

Probably. It depends on two things:
1) What structure the data is in at the moment
2) Whether we write the code to handle more "complex" arrangements of 
data: discontiguous arrays, for instance.

But the idea is to require a data structure that makes sense for the 
data. For example, a natural way to store a whole set of coordinates is 
to use an NX2 NumPy array of doubles. This is exactly the data structure 
that I want the above function to accept. If the points are somehow a 
subset of a larger array, then they will be in a discontiguous array, 
and I'm not sure if I want to bother to try to handle that. You can 
always use the generic sequence interface to access the data, but that 
will be a lot slower. We're interfacing with a static language here, we 
can get optimum performance only by specifying a particular data structure.

> If not, then great, you would
> painlessly reshape into something that had a different set of strides
> that just accessed the data that complied in the big blob of data.  If
> the reshape is expensive, then maybe we need the array abstraction,
> and then a second 'thing' that described which parts of the array to
> use for the sequence of 2-tuples to use for plotting the x,y s of a
> scatter plot. (or whatever)

The proposed array interface does provide a certain level of 
abstraction, that's what:

__array_shape__
__array_typestr__
__array_descr__
__array_strides__
__array_offset__

Are all about we could certainly write the wxPy_LIST_helper functions to 
handle a larger variety of options that the simple contiguous C array, 
but I want to start with the simple case, and I'm not sure directly 
handling the more complex cases is worth it. I'm imagining that the user 
will need to do something like:

dc.DrawPointList(asarray(points, Int))

It's easier to use the utility functions that Numeric provides than 
re-write similar code in wxPython.

> I do think we can accept more than just i4 for a datatype.  Especially
> since a last-minute cast to i4 in inexpensive for almost every data
> type.

Sure, but we're interfacing with a static language, so for each data 
type supported, we need to cast the data pointer to the right type, then 
  have a code to convert it to the type needed by wx. It's not a big 
deal, but I'd rather keep it simple. I do want to support at least 
doubles and  ints. Users can use Numeric's astype() method to convert if 
need be.

I've noticed that there is a wxRealPoint class that uses doubles, but it 
doesn't look like it can be used as input to any of the wxDC methods. 
Too bad.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From xscottg at yahoo.com  Thu Apr  7 14:13:32 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 14:13:32 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050407211227.82679.qmail@web50206.mail.yahoo.com>

--- Chris Barker <Chris.Barker at noaa.gov> wrote:
> 
> Again, I'm uncomfortable with something that I have to check being 
> optional. If it is, we're encouraging people to not check it, and that' 
> a recipe for bugs later on down the road.
> 
[snip]
>
> > I guess all I'm saying is that I wouldn't assume the offset is zero...
> 
> Good point. All the more reason to have the offset be mandatory.
>

Lot's of protocols have optional parts.

The helper functions would hide this level of detail.


> 
> Yes, if there is a C/C++ version of all these helper functions, I'll be 
> a lot happier. And you're right, the same information should not be 
> encoded in two places, so my "iscontiguous" attribute should be a helper 
> function or maybe a method.
> 
>  > In a short while, you shouldn't have to check any __array_metadata__
>  > attributes directly.  There should even be a helper function for
>  > getting the array elements.
> 
> Cool. How would that work? A C++ iterator? I"m thinking not, as this is 
> all C, no?
> 

I think this will take shape as an include file with static/inline
functions.  No linking required, just #include <ndarray.h> and call the
functions.  It would be nice but not necessary that this was distributed
with Python.

I would be in favor of having some C++ iterator interfaces (possibly a
template class) inside of a #ifdef __cplusplus block.  Python doesn't seem
to have a a lot C++ in the core so I wonder if this would meet resistance
(even when it's inside of a #ifdef block).


>
>  > It wouldn't be a horrible mistake to have all the attributes be 
>  > mandatory, but it doesn't get array consumes any benefit that they
>  > can't get from a well written helper library, and it does add some
>  > burden to array producers.
> 
> Hardly any. I'm assuming that there will be a base_array class that can 
> be used as a base class or mixin, so it wouldn't be any work at all to 
> have a full set of attributes with defaults. It would take up a little 
> bit of memory. I'm assuming that the whole point of this is to support 
> large datasets, but maybe that isn't a valid assumption, After all, 
> small array support has turned out to be very important for Numeric.
> 

If the protocol can make things easy without the use of a mixin or base
class, all the better to my way of thinking.  I don't think the memory use
is very relevant as the attributes would only require storage in the class
object, not the instances.

There is something elegant about making array creation as easy as:

    class easy_array:
        def __init__(self, filename):
            data = open(filename, 'r').read()
            self.__array_data__ = data
            self.__array_shape__ = (len(data)/4,)
            self.__array_typestr__ = '>i4'


Like I said, I don't think it would be *horrible* to require all the
attributes, but I don't see how it will benefit you at all.  And even if
all the attributes are mandatory, there are still a number of details to
get right in reading the memory.  You'll likely want to use the helper
libraries/modules regardless.  (Once they're completed of course...)


>
> As a rule of thumb, I think there will be [more] consumers of arrays
> than producers, so I'd rather make it easy on the consumers that the 
> producers, if we need to make such a trade off. Maybe I'm biased, 
> because I'm a consumer.
>

I don't see the trade off.  It will be easy for you either way, but harder
for array producers (admittedly only a little).

This has to be easier than the situation you have today right?  Imagine the
code you'd have to write to special case Numeric, scipy.base, Numarray, and
Python's array module.


Cheers,
    -Scott


From tim.hochberg at cox.net  Thu Apr  7 14:31:11 2005
From: tim.hochberg at cox.net (Tim Hochberg)
Date: Thu Apr  7 14:31:11 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050407211227.82679.qmail@web50206.mail.yahoo.com>
References: <20050407211227.82679.qmail@web50206.mail.yahoo.com>
Message-ID: <4255A635.9010309@cox.net>

Scott Gilbert wrote:

>--- Chris Barker <Chris.Barker at noaa.gov> wrote:
>  
>
[SNIP]

>
>>As a rule of thumb, I think there will be [more] consumers of arrays
>>than producers, so I'd rather make it easy on the consumers that the 
>>producers, if we need to make such a trade off. Maybe I'm biased, 
>>because I'm a consumer.
>>
>>    
>>
>
>I don't see the trade off.  It will be easy for you either way, but harder
>for array producers (admittedly only a little).
>  
>
I think there is a trade off, but not the one that Chris is worried 
about. It should be easy to hide complexity of dealing with missing 
attributes through the various helper functions. The cost will be in 
speed and will probably be most noticable in C extensions using small 
arrays where the extra code to check if an attribute is present will be 
signifigant.

How signifigant this will be, I'm not sure. And frankly I don't care all 
that much since I generally only use large arrays. However, since one of 
the big faultlines between Numarray and Numeric involves the former's 
relatively poor small array performance, I suspect someone might care.

-tim

>This has to be easier than the situation you have today right?  Imagine the
>code you'd have to write to special case Numeric, scipy.base, Numarray, and
>Python's array module.
>
>  
>


From oliphant at ee.byu.edu  Thu Apr  7 15:47:04 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Thu Apr  7 15:47:04 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050407211501.60155.qmail@web50203.mail.yahoo.com>
References: <20050407211501.60155.qmail@web50203.mail.yahoo.com>
Message-ID: <4255B7D6.9000109@ee.byu.edu>

Scott Gilbert wrote:

>I agree, we need a road map of some sort.  It could be multiple PEPs
>depending, but it should include most of the following:
>
>    - Get the bytes object submitted.  There are only a few small
>      things in PEP 296 that should be changed.
>
>  
>
#4

>      - I'm not particularly interested in implementing the new bytes
>        literal and other features discussed in PEP 332, but it is
>        related to this topic.  (The proposal is for b"xxxxxx" to be a
>        bytes literal.)  We should make note that while this is not
>        part of the numpy roadmap, nothing prohibits that from being
>        implemented by another user.
>  
>
>    - Add an ndarray module.  This module will contain the ndarray
>      object as well as a superset of your helper functions.  I
>      think implementing it in pure Python on top of the bytes
>      object is the right course.  It's partly for documentation.
>
>    - Add an include file to make this protocol easily accessible
>      from C.  It's not much code, and the entire thing could be
>      done with inline/static functions in the .h file.  It would
>      be nice if this went into Python too, but not strictly
>      required.
>  
>
I put these together at #1

>    - Add the array protocol attributes to the existing array
>      object.
>  
>
#2

>    - Flesh out the "locked buffer" stuff in PEP 298.  Add support
>      for locking the buffer to the existing array object, the
>      bytes object, the mmap object, and anything else (string?)
>      that doesn't meet too much resistance.
>  
>
#3

>    - Fix the existing buffer object to regrab it's pointer
>      every time it's needed.  Could also add support to use
>      the "locked buffer" interface where possible.  I gather
>      that you are using this particular object in scipy.base
>      (is that true??).  Several shortcomings of it could be
>      easily fixed at the Python level, but I don't feel
>      strongly that this would have to be done...  Then again
>      it isn't much work.
>  
>
#5

I can't think of anything you've missed.

I'm very supportive of this, but I have to finish scipy.base first.   I 
think Perry is supportive as well.  I know he's been playing catch-up in 
the reading.   I'm not sure of Todd's opinion.   I suspect he would 
welcome these changes to Python.

My preference order is

1) the ndarray module and ndarray.h  header with these interface 
definitions and methods. 
2) Add array interface attributes to array module
3) Flesh out locked buffer API
4) Bytes object (with Pickling support)
5) Fix current buffer object.

-Travis


From strawman at astraw.com  Thu Apr  7 15:56:03 2005
From: strawman at astraw.com (Andrew Straw)
Date: Thu Apr  7 15:56:03 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4255502D.6060306@ee.byu.edu>
References: <20050407115141.96479.qmail@web50204.mail.yahoo.com> <4255502D.6060306@ee.byu.edu>
Message-ID: <4255BA56.7000001@astraw.com>

Travis Oliphant wrote:

> Scott Gilbert wrote:
>
>> --- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
>>  
>>
>>>> Good point, but a pain. Maybe they should be required, that way I
>>>> don't have to first check for the presence of '<' or '>', then check
>>>> if they have the right value.
>>>>     
>>>
>>> I'll second this. Pulling out more Python Zen: Explicit is better than
>>> implicit.
>>>
>>>   
>>
>>
>> I'll third.
>>  
>>
>
> O.K.  It's done....
>
Here's a bit of weirdness which has prevented me from using '<' or '>' 
in the past with the struct module.  I'm not guru enough to know what's 
going on, but it has prevented me from being explicit rather than implicit.

In [1]:import struct

In [2]:from numarray.ieeespecial import nan

In [3]:nan
Out[3]:nan

In [4]:struct.pack('<d',nan)
---------------------------------------------------------------------------
exceptions.SystemError                               Traceback (most 
recent call last)

/home/astraw/<console>

SystemError: frexp() result out of range

In [5]:struct.pack('d',nan)
Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff'


From Chris.Barker at noaa.gov  Thu Apr  7 16:01:03 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Thu Apr  7 16:01:03 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4255A635.9010309@cox.net>
References: <20050407211227.82679.qmail@web50206.mail.yahoo.com> <4255A635.9010309@cox.net>
Message-ID: <4255BA80.4090201@noaa.gov>

Tim Hochberg wrote:
> Scott Gilbert wrote:
>> --- Chris Barker <Chris.Barker at noaa.gov> wrote:

>> I don't see the trade off.

I wasn't sure it applied in this case, but if there were a trade off, we 
should make things easiest for the consumers of arrays.

> I think there is a trade off, but not the one that Chris is worried 
> about. It should be easy to hide complexity of dealing with missing 
> attributes through the various helper functions. The cost will be in 
> speed and will probably be most noticable in C extensions using small 
> arrays where the extra code to check if an attribute is present will be 
> signifigant.

Actually, that is one I'm worried about. You're quite right, if I'm 
dealing with a 2X2 array, those helper functions are going to take much 
longer to run than accessing (and maybe using) the data. Like Tim, I'm 
mostly interested in using this for large data sets, but I think the 
small array thing might crop up unexpectedly. For example, with the 
current numarray, if you pass in an NX2 array to wxPython (to draw a 
polygon, for instance), it's very slow. It turns out that that's because 
a whole set of (2,) arrays are created when extracting the data, so even 
though you're dealing with a large data set, you end up dealing with a 
LOT of small arrays. Of course, the whole point of this is to avoid 
that, but I don't think we should assume that any overhead is negligible.

> 
>> This has to be easier than the situation you have today right?  
well, sure. Though it seems to be harder than using the Numeric API. 
Directly. However, I'll shut up now, as it seems that the proposed 
utility functions will address my issues.

-Chris

PS to Tim: Want to help out with the wxPython integration?


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From xscottg at yahoo.com  Thu Apr  7 20:05:48 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 20:05:48 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050408030336.54970.qmail@web50209.mail.yahoo.com>

--- Andrew Straw <strawman at astraw.com> wrote:
>
> Here's a bit of weirdness which has prevented me from using '<' or '>' 
> in the past with the struct module.  I'm not guru enough to know what's 
> going on, but it has prevented me from being explicit rather than
> implicit.
> 
> In [1]:import struct
> 
> In [2]:from numarray.ieeespecial import nan
> 
> In [3]:nan
> Out[3]:nan
> 
> In [4]:struct.pack('<d',nan)
>
---------------------------------------------------------------------------
> exceptions.SystemError                               Traceback (most 
> recent call last)
> 
> /home/astraw/<console>
> 
> SystemError: frexp() result out of range
> 
> In [5]:struct.pack('d',nan)
> Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff'
> 


No clue why that is, but it certainly looks like a bug in the struct
module.  It shouldn't make any difference about whether or not the array
protocol reports the endian though.  It's using a different notation for
typecodes.


Cheers,
    -Scott


From rkern at ucsd.edu  Thu Apr  7 20:24:38 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Thu Apr  7 20:24:38 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050408030336.54970.qmail@web50209.mail.yahoo.com>
References: <20050408030336.54970.qmail@web50209.mail.yahoo.com>
Message-ID: <4255F79D.4000501@ucsd.edu>

Scott Gilbert wrote:
> --- Andrew Straw <strawman at astraw.com> wrote:
> 
>>Here's a bit of weirdness which has prevented me from using '<' or '>' 
>>in the past with the struct module.  I'm not guru enough to know what's 
>>going on, but it has prevented me from being explicit rather than
>>implicit.
>>
>>In [1]:import struct
>>
>>In [2]:from numarray.ieeespecial import nan
>>
>>In [3]:nan
>>Out[3]:nan
>>
>>In [4]:struct.pack('<d',nan)
>>
> 
> ---------------------------------------------------------------------------
> 
>>exceptions.SystemError                               Traceback (most 
>>recent call last)
>>
>>/home/astraw/<console>
>>
>>SystemError: frexp() result out of range
>>
>>In [5]:struct.pack('d',nan)
>>Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff'
>>
> 
> 
> 
> No clue why that is, but it certainly looks like a bug in the struct
> module.  It shouldn't make any difference about whether or not the array
> protocol reports the endian though.  It's using a different notation for
> typecodes.

This behavior is expplained by Tim Peters:

http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From xscottg at yahoo.com  Thu Apr  7 21:07:02 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 21:07:02 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050408040601.86838.qmail@web50203.mail.yahoo.com>

--- Tim Hochberg <tim.hochberg at cox.net> wrote:
>
> I think there is a trade off, but not the one that Chris is worried 
> about. It should be easy to hide complexity of dealing with missing 
> attributes through the various helper functions. The cost will be in 
> speed and will probably be most noticable in C extensions using small 
> arrays where the extra code to check if an attribute is present will be 
> signifigant.
> 
> How signifigant this will be, I'm not sure. And frankly I don't care all 
> that much since I generally only use large arrays. However, since one of 
> the big faultlines between Numarray and Numeric involves the former's 
> relatively poor small array performance, I suspect someone might care.
> 

You must check the return value of the PyObject_GetAttr (or
PyObject_GetAttrString) calls regardless.  Otherwise the extension will die
with an ugly segfault the first time one passes an float where an array was
expected.

If we're talking about small light-weight arrays and a C/C++ function that
wants to work with them very efficiently, I'm not convinced that requiring
the attributes be present will make things faster.


As we're talking about small light weight arrays, it's unlikely the
individual arrays will have __array_shape__ or __array_strides__ already
stored as tuples.  They'll probably store them as a C array as part of
their PyObject structure.


In the world where some of these attributes are optional:  If an attribute
like __array_offset__ or __array_shape__ isn't present, the C code will
know to use zero or the default C-contiguous layout.  So the check failed,
but the failure case is probably very fast (since a temporary tuple object
doesn't have to be built by the array on the fly).


In the world where all of the attributes are required:  The array object
will have to generate the __array_offset__ int/long or __array_shape___
tuple from it's own internal representation.  Then the C/C++ consumer code
will bust apart the tuple to get the values.  So the check succeeded, but
the success code needs to grab the parts of the tuple.


The C helper code could look like:

    struct PyNDArrayInfo {
        int ndims;
        int endian;
        char itemcode;
        size_t itemsize;
        Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */
        Py_LONG_LONG offset;
        Py_LONG_LONG strides[40];
        /* More Array Info goes here */
    };

    int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) {
        PyObject* shape;
        PyObject* offset;
        PyObject* strides;
        int ii, len;

        info->itemsize = too_long_for_this_example(obj);

        shape = PyObject_GetAttrString(obj, "__array_shape__");
        if (!shape) return 0;
        len = PySequence_Size(shape);
        if (len < 0) return 0;
        if (len > 40) return 0; /* This needs work */
        info->ndims = len;
        for (ii = 0; ii<len; ii++) {
            PyObject* val = PySequence_GetItem(shape, ii);
            info->shape[ii] = PyLong_AsLongLong(val);
            Py_DECREF(val);
        }
        Py_DECREF(shape);

        offset = PyObject_GetAttrString(obj, "__array_offset__");
        if (offset) {
            /*** THIS PART MIGHT BE SLOWER WHEN IT SUCCEEDS ***/
            info->offset = PyLong_AsLongLong(offset);
            Py_DECREF(offset);
        } else {
            PyErr_Clear();
            info->offset = 0;
        }

        strides = PyObject_GetAttrString(obj, "__array_strides__");
        if (strides) {
            /*** THIS PART IS ALMOST CERTAINLY SLOWER ***/
            for (ii = 0; ii<ndims; ii++) {
                PyObject* val = PySequence_GetItem(strides, ii);
                info->strides[ii] = PyLong_AsLongLong(val);
                Py_DECREF(val);
            }
            Py_DECREF(strides);
        } else {
            /*** THIS FAILURE PATH IS PROBABLY FASTER ***/
            size_t size = info->size;
            PyErr_Clear();
            for (ii = ndims-1; ii>=0; ii--) {
                info->strides[ii] = size;
                size *= info->shape[ii];
            }
        }

        /* More code goes here */
    }


I have no idea how expensive PyErr_Clear() is.  We'd have to profile it to
see for certain.  If PyErr_Clear() is not expensive, then we could make a
strong argument that *not* requiring the attributes will be more efficient.
 

It could also be so close that it doesn't matter - in which case it's back
to being a matter of taste...


Cheers,
    -Scott


From xscottg at yahoo.com  Thu Apr  7 21:16:06 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 21:16:06 2005
Subject: [Numpy-discussion] Questions about the array interface.
Message-ID: <20050408041417.61390.qmail@web50210.mail.yahoo.com>

Oops, sent too fast.  Quick correction...
 
> 
> In the world where some of these attributes are optional:  If an
> attribute like __array_offset__ or __array_shape__ isn't present,
> the C code will know to use zero or the default C-contiguous layout.
> So the check failed, but the failure case is probably very fast 
> (since a temporary tuple object doesn't have to be built by the array
> on the fly).
> 

I meant to say "__array_offset__ or __array_stides___".  The
__array_shape__ attribute would always be required for arrays...


Cheers,
    -Scott


From tim.hochberg at cox.net  Thu Apr  7 23:56:10 2005
From: tim.hochberg at cox.net (Tim Hochberg)
Date: Thu Apr  7 23:56:10 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050408040601.86838.qmail@web50203.mail.yahoo.com>
References: <20050408040601.86838.qmail@web50203.mail.yahoo.com>
Message-ID: <42562AC5.3040502@cox.net>

Scott Gilbert wrote:

>--- Tim Hochberg <tim.hochberg at cox.net> wrote:
>  
>
>>I think there is a trade off, but not the one that Chris is worried 
>>about. It should be easy to hide complexity of dealing with missing 
>>attributes through the various helper functions. The cost will be in 
>>speed and will probably be most noticable in C extensions using small 
>>arrays where the extra code to check if an attribute is present will be 
>>signifigant.
>>
>>How signifigant this will be, I'm not sure. And frankly I don't care all 
>>that much since I generally only use large arrays. However, since one of 
>>the big faultlines between Numarray and Numeric involves the former's 
>>relatively poor small array performance, I suspect someone might care.
>>
>>    
>>
>
>You must check the return value of the PyObject_GetAttr (or
>PyObject_GetAttrString) calls regardless.  Otherwise the extension will die
>with an ugly segfault the first time one passes an float where an array was
>expected.
>
>If we're talking about small light-weight arrays and a C/C++ function that
>wants to work with them very efficiently, I'm not convinced that requiring
>the attributes be present will make things faster.
>
>
>As we're talking about small light weight arrays, it's unlikely the
>individual arrays will have __array_shape__ or __array_strides__ already
>stored as tuples.  They'll probably store them as a C array as part of
>their PyObject structure.
>
>
>In the world where some of these attributes are optional:  If an attribute
>like __array_offset__ or __array_shape__ isn't present, the C code will
>know to use zero or the default C-contiguous layout.  So the check failed,
>but the failure case is probably very fast (since a temporary tuple object
>doesn't have to be built by the array on the fly).
>
>
>In the world where all of the attributes are required:  The array object
>will have to generate the __array_offset__ int/long or __array_shape___
>tuple from it's own internal representation.  Then the C/C++ consumer code
>will bust apart the tuple to get the values.  So the check succeeded, but
>the success code needs to grab the parts of the tuple.
>
>
>
>The C helper code could look like:
>  
>

I'm not convinced it's legit to assume that a failure to get the 
attribute means that it's not present and call PyErrorClear. Just as a 
for instance, what if the attribute in question is implemented as a 
descriptor in which there is some internal error. Then your burying the 
error and most likely doing the wrong thing. As far as I can tell, the 
only correct way to do this is to use PyObject_HasAttrString, then 
PyObject_GetAttrString if that succeeds.

The point about not passing around the tuples probably being faster is a 
good one. Another thought is that requiring tuples instead of general 
sequences would make the helper faster (since one could use 
*PyTuple_GET_**ITEM*, which I believe is much faster than 
PySequence_GetItem). This would possibly shift more pain onto the 
implementer of the object though. I suspect that the best strategy, 
orthogonal to requiring all attributes or not, is to use PySequence_Fast 
to get a fast sequence and work with that. This means that objects that 
return tuples for strides, etc would run at maximum possible speed, 
while other sequences would still work.

Back to requiring attributes or not. I suspect that the fastest correct 
way is to require all attributes, but allow them to be None, in which 
case the default value is used. Then any errors are easily bubbled up 
and a fast check for None choses whether to use the defaults or not.

It's late, so I hope that's not too incoherent. Or too wrong.

Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort 
of error checking (it can allegedly raise errors). I suppose that means 
one is supposed to call PyError_Occurred after every call? That's sort 
of painful!

-tim
**

>    struct PyNDArrayInfo {
>        int ndims;
>        int endian;
>        char itemcode;
>        size_t itemsize;
>        Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */
>        Py_LONG_LONG offset;
>        Py_LONG_LONG strides[40];
>        /* More Array Info goes here */
>    };
>
>    int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) {
>        PyObject* shape;
>        PyObject* offset;
>        PyObject* strides;
>        int ii, len;
>
>        info->itemsize = too_long_for_this_example(obj);
>
>        shape = PyObject_GetAttrString(obj, "__array_shape__");
>        if (!shape) return 0;
>        len = PySequence_Size(shape);
>        if (len < 0) return 0;
>        if (len > 40) return 0; /* This needs work */
>        info->ndims = len;
>        for (ii = 0; ii<len; ii++) {
>            PyObject* val = PySequence_GetItem(shape, ii);
>            info->shape[ii] = PyLong_AsLongLong(val);
>            Py_DECREF(val);
>        }
>        Py_DECREF(shape);
>
>        offset = PyObject_GetAttrString(obj, "__array_offset__");
>        if (offset) {
>            /*** THIS PART MIGHT BE SLOWER WHEN IT SUCCEEDS ***/
>            info->offset = PyLong_AsLongLong(offset);
>            Py_DECREF(offset);
>        } else {
>            PyErr_Clear();
>            info->offset = 0;
>        }
>
>        strides = PyObject_GetAttrString(obj, "__array_strides__");
>        if (strides) {
>            /*** THIS PART IS ALMOST CERTAINLY SLOWER ***/
>            for (ii = 0; ii<ndims; ii++) {
>                PyObject* val = PySequence_GetItem(strides, ii);
>                info->strides[ii] = PyLong_AsLongLong(val);
>                Py_DECREF(val);
>            }
>            Py_DECREF(strides);
>        } else {
>            /*** THIS FAILURE PATH IS PROBABLY FASTER ***/
>            size_t size = info->size;
>            PyErr_Clear();
>            for (ii = ndims-1; ii>=0; ii--) {
>                info->strides[ii] = size;
>                size *= info->shape[ii];
>            }
>        }
>
>        /* More code goes here */
>    }
>
>
>
>I have no idea how expensive PyErr_Clear() is.  We'd have to profile it to
>see for certain.  If PyErr_Clear() is not expensive, then we could make a
>strong argument that *not* requiring the attributes will be more efficient.
> 
>
>It could also be so close that it doesn't matter - in which case it's back
>to being a matter of taste...
>
>
>Cheers,
>    -Scott
>
>
>
>
>
>  
>


From cookedm at physics.mcmaster.ca  Fri Apr  8 00:43:08 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Fri Apr  8 00:43:08 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <42562AC5.3040502@cox.net>
References: <20050408040601.86838.qmail@web50203.mail.yahoo.com> <42562AC5.3040502@cox.net>
Message-ID: <20050408074129.GA16479@arbutus.physics.mcmaster.ca>

On Thu, Apr 07, 2005 at 11:55:01PM -0700, Tim Hochberg wrote:
> Scott Gilbert wrote:
> 
> >--- Tim Hochberg <tim.hochberg at cox.net> wrote:
> > 
> >
> >>I think there is a trade off, but not the one that Chris is worried 
> >>about. It should be easy to hide complexity of dealing with missing 
> >>attributes through the various helper functions. The cost will be in 
> >>speed and will probably be most noticable in C extensions using small 
> >>arrays where the extra code to check if an attribute is present will be 
> >>signifigant.
> >>
> >>How signifigant this will be, I'm not sure. And frankly I don't care all 
> >>that much since I generally only use large arrays. However, since one of 
> >>the big faultlines between Numarray and Numeric involves the former's 
> >>relatively poor small array performance, I suspect someone might care.
> >>
> >>   
> >>
> >
> >You must check the return value of the PyObject_GetAttr (or
> >PyObject_GetAttrString) calls regardless.  Otherwise the extension will die
> >with an ugly segfault the first time one passes an float where an array was
> >expected.
> >
> >If we're talking about small light-weight arrays and a C/C++ function that
> >wants to work with them very efficiently, I'm not convinced that requiring
> >the attributes be present will make things faster.
> >
> >
> >As we're talking about small light weight arrays, it's unlikely the
> >individual arrays will have __array_shape__ or __array_strides__ already
> >stored as tuples.  They'll probably store them as a C array as part of
> >their PyObject structure.
> >
> >
> >In the world where some of these attributes are optional:  If an attribute
> >like __array_offset__ or __array_shape__ isn't present, the C code will
> >know to use zero or the default C-contiguous layout.  So the check failed,
> >but the failure case is probably very fast (since a temporary tuple object
> >doesn't have to be built by the array on the fly).
> >
> >In the world where all of the attributes are required:  The array object
> >will have to generate the __array_offset__ int/long or __array_shape___
> >tuple from it's own internal representation.  Then the C/C++ consumer code
> >will bust apart the tuple to get the values.  So the check succeeded, but
> >the success code needs to grab the parts of the tuple.
> >
> >The C helper code could look like:
> 
> I'm not convinced it's legit to assume that a failure to get the 
> attribute means that it's not present and call PyErrorClear. Just as a 
> for instance, what if the attribute in question is implemented as a 
> descriptor in which there is some internal error. Then your burying the 
> error and most likely doing the wrong thing. As far as I can tell, the 
> only correct way to do this is to use PyObject_HasAttrString, then 
> PyObject_GetAttrString if that succeeds.

No point: PyObject_HasAttrString *calls* PyObject_GetAttrString, then
clears the error if there is one.

[Side note: hasattr() in Python works the same way, which is why using
properties is a pain when you've got code that's using it]

> The point about not passing around the tuples probably being faster is a 
> good one. Another thought is that requiring tuples instead of general 
> sequences would make the helper faster (since one could use 
> *PyTuple_GET_**ITEM*, which I believe is much faster than 
> PySequence_GetItem). This would possibly shift more pain onto the 
> implementer of the object though. I suspect that the best strategy, 
> orthogonal to requiring all attributes or not, is to use PySequence_Fast 
> to get a fast sequence and work with that. This means that objects that 
> return tuples for strides, etc would run at maximum possible speed, 
> while other sequences would still work.

How about objects that use a lightweight array as the strides sequence?
I'm thinking that if you've got a fast 1-d array object, you'd be
tempted to use an instance of that as the shape or strides attribute.
You'd be saving on temporary tuple creation (but you'd still be losing
some in making Python ints).

I haven't benchmarked it, but I'm looking at the code for
PySequence_GetItem(): it does a few pointer derefences to get the
sq_item() method in the tp_as_sequence struct of an object implementing
the sequence protocol, which for the tuple does an array indexing of the
tuple's data. You've got about two function calls more compared
to using PyTuple_GET_ITEM.

It really depends on how big the arrays you expect to get passed to you.
If they're big, this is all amortized: you'll hardly see it.
It also depends on how your routines get used. If the routine is buried
below a few layers of API, you'd likely be better off doing a typecast
higher up to your own representation, or something. If it's at the
border, so the user will call it directly *often*, you're going to be
screwed for speed anyways (giving the user the option of casting arrays
to something else would probably help a lot here also).

> Back to requiring attributes or not. I suspect that the fastest correct 
> way is to require all attributes, but allow them to be None, in which 
> case the default value is used. Then any errors are easily bubbled up 
> and a fast check for None choses whether to use the defaults or not.
> 
> It's late, so I hope that's not too incoherent. Or too wrong.
> 
> Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort 
> of error checking (it can allegedly raise errors). I suppose that means 
> one is supposed to call PyError_Occurred after every call? That's sort 
> of painful!

Yes! Check all C API functions that may return errors! That includes
PySequence_GetItem() and PyLong_AsLongLong.

> >   struct PyNDArrayInfo {
> >       int ndims;
> >       int endian;
> >       char itemcode;
> >       size_t itemsize;
> >       Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */
> >       Py_LONG_LONG offset;
> >       Py_LONG_LONG strides[40];
> >       /* More Array Info goes here */
> >   };
> >
> >   int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) {
> >       PyObject* shape;
> >       PyObject* offset;
> >       PyObject* strides;
> >       int ii, len;
> >
> >       info->itemsize = too_long_for_this_example(obj);
> >
> >       shape = PyObject_GetAttrString(obj, "__array_shape__");
> >       if (!shape) return 0;
> >       len = PySequence_Size(shape);
> >       if (len < 0) return 0;
> >       if (len > 40) return 0; /* This needs work */
> >       info->ndims = len;
> >       for (ii = 0; ii<len; ii++) {
> >           PyObject* val = PySequence_GetItem(shape, ii);

Like here
> >           info->shape[ii] = PyLong_AsLongLong(val);
and here
> >           Py_DECREF(val);
(if you don't check PySequence_GetItem -- not a good idea anyways --
this should be Py_XDECREF)

[snip more code that needs checks :-)]

> >I have no idea how expensive PyErr_Clear() is.  We'd have to profile it to
> >see for certain.  If PyErr_Clear() is not expensive, then we could make a
> >strong argument that *not* requiring the attributes will be more efficient.

Not much; it's about three Py_XDECREF's.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From cookedm at physics.mcmaster.ca  Fri Apr  8 01:22:09 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Fri Apr  8 01:22:09 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
Message-ID: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>

It seems that people are worried about speed of the attribute-based
array interface when using small arrays in C.

Here's an alternative: Define some attribute (for now, call it
__array_c__), which returns a CObject whose value (which you get with
PyCObject_GetVoidPtr) would be a pointer to a struct describing the
array. It would look something like

typedef struct {
    int version;
    int nd;
    Py_LONG_LONG *shape;
    char typecode;
    Py_LONG_LONG *strides;
    Py_LONG_LONG offset;
    void *data;
} SimpleCArray;

(The order here follows that of the array interface spec; if somebody's
got any comments on what mixing int's, Py_LONG_LONG, and char's in a
struct does to the packing and potential alignment problems I'd like to
know.)

version is there as a sanity check: I'd say for this version it's
something like 0xDECAF ('cause it's lightweight, see ;-). It's primarily
a check that you've got the right thing (sinc CObjects are
intrinsically opaque types).

Then:
- the array object guarantees that the data, etc. remains alive,
  probably by passing itself as the desc parameter to the CObject.
  The array data would have to stay at the same location and the same
  size while the reference is held.

- typecode follows that of the __array_typestr__ attribute

- shape and strides are pointers to arrays of at least nd elements.

- this doesn't handle byteswapped as-is. Maybe a flags, or endian,
  attribute could be added.

- you can still have the full attribute-base array interface
  (__array_strides__, etc.) to fall back on. If the typecode is 'V',
  you'll have to look at __array_descr__.

Creating one from a Numeric PyArrayObject would go like this:

PyObject *create_SimpleCArray(PyArrayObject *a)
{
    SimpleCArray *ca = PyMem_New(SimpleCArray, 1);
    ca->version = 0xDECAF;
    ca->nd = a->nd;
    ca->shape = PyMem_New(Py_LONG_LONG, ca->nd);
    for (i = 0; i < ca->nd; i++) {
        ca->shape[i] = a->dimensions[i];
    }
    ca->strides = PyMem_New(Py_LONG_LONG, ca->nd);
    for (i = 0; i < ca->nd; i++) {
        ca->strides[i] = a->strides[i];
    }
    ca->offset = 0;
    ca->data = &my_data;

    Py_INCREF(a);
    PyObject *co = PyCObject_FromVoidPtrAndDesc(ca, a, free_numeric_simplecarray);
    return co;
}

where
void free_numeric_simplecarray(SimpleCArray *ca, PyArrayObject *a)
{
    PyMem_Free(ca->shape);
    PyMem_Free(ca->strides);
    PyMem_Free(ca);
    Py_DECREF(a);
}

Some points:
- you have to keep the CObject around: destroying it will potentially
  destroy the array you're looking at.
- I was thinking that maybe adding a PyObject *owner could make it
  easier to keep track of the owner; I'm not sure, as the descr argument
  in CObjects can easily play that role.
- The creator of the SimpleCArray is free to add elements to the end
  (as long as they don't affect the padding/alignment of the previous
  ones: haven't thought about this). You could put the real owner of the
  array data there, for example (say, if it was wrapping a Blitz++
  array). Or have a small _strides[30] array at the end, and strides
  would point to that (saving you a memory allocation).

This simple C interface would, I think, alleviate much worries about
speed for small arrays, and even for large arrays.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From curzio.basso at unibas.ch  Fri Apr  8 06:30:05 2005
From: curzio.basso at unibas.ch (Curzio Basso)
Date: Fri Apr  8 06:30:05 2005
Subject: [Numpy-discussion] profile reveals calls to astype()
In-Reply-To: <1112896207.2437.34.camel@halloween.stsci.edu>
References: <4255664F.2070107@unibas.ch> <1112896207.2437.34.camel@halloween.stsci.edu>
Message-ID: <4256873B.2060501@unibas.ch>

Todd Miller wrote:

> astype() is used in a bunch of places, including the C-API,  so it's
> hard to guess how it's getting called with the information here.  In

ok, so probably C functions are somehow 'transparent' to the profiler which does not report them,
but reports the python functions called by the C one...

>>>>from yourmodule import newfoo  # you redefined foo to accept N as a parameter
>>>>import pdb
>>>>pdb.run("newfoo(N=2)")
> 
> (pdb) s  # step along a little to get into newfoo()
> ... step output
> (pdb) import numarray.numarraycore as nc
> (pdb) break nc.astype

strange, what I get now is:

> (Pdb) b nc.astype
> *** The specified object 'nc.astype' is not a function
> or was not found along sys.path.

and in fact if I look at nc.__dict__ there is no 'astype' key. I'm running the whole program (rather
than just the function) under ipython, starting it with

> %run -d myprog.py

maybe this could mess up things?

curzio


From jmiller at stsci.edu  Fri Apr  8 06:45:13 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Fri Apr  8 06:45:13 2005
Subject: [Numpy-discussion] profile reveals calls to astype()
In-Reply-To: <4256873B.2060501@unibas.ch>
References: <4255664F.2070107@unibas.ch>
	 <1112896207.2437.34.camel@halloween.stsci.edu> <4256873B.2060501@unibas.ch>
Message-ID: <1112967803.5142.29.camel@halloween.stsci.edu>

On Fri, 2005-04-08 at 09:29, Curzio Basso wrote:
> Todd Miller wrote:
> 
> > astype() is used in a bunch of places, including the C-API,  so it's
> > hard to guess how it's getting called with the information here.  In
> 
> ok, so probably C functions are somehow 'transparent' to the profiler which does not report them,
> but reports the python functions called by the C one...
> 
> >>>>from yourmodule import newfoo  # you redefined foo to accept N as a parameter
> >>>>import pdb
> >>>>pdb.run("newfoo(N=2)")
> > 
> > (pdb) s  # step along a little to get into newfoo()
> > ... step output
> > (pdb) import numarray.numarraycore as nc
> > (pdb) break nc.astype
> 
> strange, what I get now is:
> 
> > (Pdb) b nc.astype
> > *** The specified object 'nc.astype' is not a function
> > or was not found along sys.path.
> 
> and in fact if I look at nc.__dict__ there is no 'astype' key. I'm running the whole program (rather
> than just the function) under ipython, starting it with
> 
> > %run -d myprog.py
> 
> maybe this could mess up things?

No.  I should have said "b nc.NumArray.astype".  I just tried this out
with an astype() callback from numarray.convolve's C-code and it worked
OK for me.

Regards,
Todd


From strawman at astraw.com  Fri Apr  8 08:00:13 2005
From: strawman at astraw.com (Andrew Straw)
Date: Fri Apr  8 08:00:13 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4255F79D.4000501@ucsd.edu>
References: <20050408030336.54970.qmail@web50209.mail.yahoo.com> <4255F79D.4000501@ucsd.edu>
Message-ID: <42569C4D.2080904@astraw.com>

Robert Kern wrote:

> Scott Gilbert wrote:
>
>> --- Andrew Straw <strawman at astraw.com> wrote:
>>
>>> Here's a bit of weirdness which has prevented me from using '<' or 
>>> '>' in the past with the struct module.  I'm not guru enough to know 
>>> what's going on, but it has prevented me from being explicit rather 
>>> than
>>> implicit.
>>>
>>> In [1]:import struct
>>>
>>> In [2]:from numarray.ieeespecial import nan
>>>
>>> In [3]:nan
>>> Out[3]:nan
>>>
>>> In [4]:struct.pack('<d',nan)
>>>
>>
>> --------------------------------------------------------------------------- 
>>
>>
>>> exceptions.SystemError                               Traceback (most 
>>> recent call last)
>>>
>>> /home/astraw/<console>
>>>
>>> SystemError: frexp() result out of range
>>>
>>> In [5]:struct.pack('d',nan)
>>> Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff'
>>>
>>
>>
>>
>> No clue why that is, but it certainly looks like a bug in the struct
>> module.  It shouldn't make any difference about whether or not the array
>> protocol reports the endian though.  It's using a different notation for
>> typecodes.
>
>
> This behavior is expplained by Tim Peters:
>
> http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a
>
I feared it was something like that. (No platform independent way to 
represent special values like nan, inf, and so on.)  So I think if we're 
going to require an encoding character such as '<' or '>' we should also 
include one that means native which CAN handle these special values... 
And document why it's needed and why it may get one into trouble.


From jmiller at stsci.edu  Fri Apr  8 10:14:04 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Fri Apr  8 10:14:04 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
In-Reply-To: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
Message-ID: <1112980431.5142.116.camel@halloween.stsci.edu>

On Fri, 2005-04-08 at 04:21, David M. Cooke wrote:
> It seems that people are worried about speed of the attribute-based
> array interface when using small arrays in C.

I was a little worried too,  but think the array protocol idea is a good
one in any case.  Thinking about this,  I'm wondering if what we used to
do in early numarray (0.2) wouldn't work here.  Our "consumer interface"
/ helper function looked more like this:

int getSimpleCArray(PyObject *o, SimpleCArray *info);

It basically just fills in the caller's SimpleCArray struct using
information from o and returns 0, or -1 with an exception set if there's
some problem.  In numarray's SimpleCArray struct,  the shape and strides
arrays were fully allocated (i.e. Py_LONG_LONG shape[MAXDIM];) so the
struct could be placed in an auto variable with nothing to free() later.

In this interface,  there is no implied getattr at all,  since the
helper function getSimpleCArray() can be made as smart (i.e. given
knowledge about specific types) as people are motivated to make it. 
So,  for a Numeric array or a numarray or a Numeric3 array, 
getSimpleCArray would presumably just copy from struct to struct,  but
for other types,  it might fall back on the many-getattr approach.

Regards,
Todd

> Here's an alternative: Define some attribute (for now, call it
> __array_c__), which returns a CObject whose value (which you get with
> PyCObject_GetVoidPtr) would be a pointer to a struct describing the
> array. It would look something like
> 
> typedef struct {
>     int version;
>     int nd;
>     Py_LONG_LONG *shape;
>     char typecode;
>     Py_LONG_LONG *strides;
>     Py_LONG_LONG offset;
>     void *data;
> } SimpleCArray;
> 
> (The order here follows that of the array interface spec; if somebody's
> got any comments on what mixing int's, Py_LONG_LONG, and char's in a
> struct does to the packing and potential alignment problems I'd like to
> know.)
> 
> version is there as a sanity check: I'd say for this version it's
> something like 0xDECAF ('cause it's lightweight, see ;-). It's primarily
> a check that you've got the right thing (sinc CObjects are
> intrinsically opaque types).
> 
> Then:
> - the array object guarantees that the data, etc. remains alive,
>   probably by passing itself as the desc parameter to the CObject.
>   The array data would have to stay at the same location and the same
>   size while the reference is held.
> 
> - typecode follows that of the __array_typestr__ attribute
> 
> - shape and strides are pointers to arrays of at least nd elements.
> 
> - this doesn't handle byteswapped as-is. Maybe a flags, or endian,
>   attribute could be added.
> 
> - you can still have the full attribute-base array interface
>   (__array_strides__, etc.) to fall back on. If the typecode is 'V',
>   you'll have to look at __array_descr__.
> 
> Creating one from a Numeric PyArrayObject would go like this:
> 
> PyObject *create_SimpleCArray(PyArrayObject *a)
> {
>     SimpleCArray *ca = PyMem_New(SimpleCArray, 1);
>     ca->version = 0xDECAF;
>     ca->nd = a->nd;
>     ca->shape = PyMem_New(Py_LONG_LONG, ca->nd);
>     for (i = 0; i < ca->nd; i++) {
>         ca->shape[i] = a->dimensions[i];
>     }
>     ca->strides = PyMem_New(Py_LONG_LONG, ca->nd);
>     for (i = 0; i < ca->nd; i++) {
>         ca->strides[i] = a->strides[i];
>     }
>     ca->offset = 0;
>     ca->data = &my_data;
> 
>     Py_INCREF(a);
>     PyObject *co = PyCObject_FromVoidPtrAndDesc(ca, a, free_numeric_simplecarray);
>     return co;
> }
> 
> where
> void free_numeric_simplecarray(SimpleCArray *ca, PyArrayObject *a)
> {
>     PyMem_Free(ca->shape);
>     PyMem_Free(ca->strides);
>     PyMem_Free(ca);
>     Py_DECREF(a);
> }
> 
> Some points:
> - you have to keep the CObject around: destroying it will potentially
>   destroy the array you're looking at.
> - I was thinking that maybe adding a PyObject *owner could make it
>   easier to keep track of the owner; I'm not sure, as the descr argument
>   in CObjects can easily play that role.
> - The creator of the SimpleCArray is free to add elements to the end
>   (as long as they don't affect the padding/alignment of the previous
>   ones: haven't thought about this). You could put the real owner of the
>   array data there, for example (say, if it was wrapping a Blitz++
>   array). Or have a small _strides[30] array at the end, and strides
>   would point to that (saving you a memory allocation).
> 
> This simple C interface would, I think, alleviate much worries about
> speed for small arrays, and even for large arrays.
-- 


From xscottg at yahoo.com  Fri Apr  8 11:06:04 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 11:06:04 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <42562AC5.3040502@cox.net>
Message-ID: <20050408180523.95022.qmail@web50207.mail.yahoo.com>

--- Tim Hochberg <tim.hochberg at cox.net> wrote:
> 
> The point about not passing around the tuples probably being faster is a 
> good one. Another thought is that requiring tuples instead of general 
> sequences would make the helper faster (since one could use 
> *PyTuple_GET_**ITEM*, which I believe is much faster than 
> PySequence_GetItem). This would possibly shift more pain onto the 
> implementer of the object though. I suspect that the best strategy, 
> orthogonal to requiring all attributes or not, is to use PySequence_Fast 
> to get a fast sequence and work with that. This means that objects that 
> return tuples for strides, etc would run at maximum possible speed, 
> while other sequences would still work.
> 

I hadn't seen this "fast" sequence stuff before.  Thanks for the pointer.

>
> Back to requiring attributes or not. I suspect that the fastest correct 
> way is to require all attributes, but allow them to be None, in which 
> case the default value is used. Then any errors are easily bubbled up 
> and a fast check for None choses whether to use the defaults or not.
> 

How about saying that, for all the optional attributes, if they return None
that's to be treated the same way as if they weren't present at all?  In
other words, they're still optional, but people in the know would know that
returning None was probably faster...


From xscottg at yahoo.com  Fri Apr  8 11:14:27 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 11:14:27 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050408074129.GA16479@arbutus.physics.mcmaster.ca>
Message-ID: <20050408181314.89274.qmail@web50205.mail.yahoo.com>

--- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
>
> > Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort

> > of error checking (it can allegedly raise errors). I suppose that means
> > one is supposed to call PyError_Occurred after every call? That's sort 
> > of painful!
> 
> Yes! Check all C API functions that may return errors! That includes
> PySequence_GetItem() and PyLong_AsLongLong.
> 

Sorry, I should have been clear that I was writing example code.  I only
put the error checking in where I thought it was demonstrating the point. 
I'd be surprized if it even compiled...

Note that the additional error checking is required in the "success" path
where the attributes are present.  In other words, mandating the attributes
be there when they aren't strictly required could make things slower...


Cheers,
    -Scott


From xscottg at yahoo.com  Fri Apr  8 12:24:02 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 12:24:02 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
In-Reply-To: 6667
Message-ID: <20050408192312.91215.qmail@web50206.mail.yahoo.com>

--- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
>
> It seems that people are worried about speed of the attribute-based
> array interface when using small arrays in C.
>

I'm really not worried about it...  I just don't want "performance" to be
used as an argument for a given design decisions when the proposed change
won't actually make things faster.


> 
> Here's an alternative: Define some attribute (for now, call it
> [snip]
>

This would definitely be faster.  Faster yet would be doing a
PyNumeric_Check (or PyNumarray_Check, or whatever they're called) and just
cast the pointer to the underlying representation.  If you must go fast, go
as fast as possible...

I'd rather we didn't add a lot complexity to the array protocol to just go
at a medium speed.


Cheers,
    -Scott


From oliphant at ee.byu.edu  Fri Apr  8 13:55:27 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  8 13:55:27 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
In-Reply-To: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
Message-ID: <4256EF45.6070004@ee.byu.edu>

David M. Cooke wrote:

>It seems that people are worried about speed of the attribute-based
>array interface when using small arrays in C.
>  
>
I think we are talking about here an *array protocol*   (i.e. like the 
buffer protocol and sequence
protocol). 

So far we have just described the Python level interface.  I would like 
to see an array protocol added (perhaps to the buffer protocol table).  
This could be done just as David describes --- we don't even need to use 
the C-pointer (just return a void *pointer which has a version as the 
first entry).

I think this is how the C-level should be handled, I think.  Yes, it 
does not require changes to Python to implement the __array_c__  
attribute.  But, ultimately, it would be better if we used the C-level 
protocol concept that Python already uses for other objects.

-Travis


From perry at stsci.edu  Fri Apr  8 14:05:05 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Fri Apr  8 14:05:05 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4255B7D6.9000109@ee.byu.edu>
References: <20050407211501.60155.qmail@web50203.mail.yahoo.com> <4255B7D6.9000109@ee.byu.edu>
Message-ID: <819eb85df29878341dd00521bbba280d@stsci.edu>

On Apr 7, 2005, at 6:44 PM, Travis Oliphant wrote:
>
> I can't think of anything you've missed.
>
> I'm very supportive of this, but I have to finish scipy.base first.   
> I think Perry is supportive as well.  I know he's been playing 
> catch-up in the reading.   I'm not sure of Todd's opinion.   I suspect 
> he would welcome these changes to Python.
>
> My preference order is
>
> 1) the ndarray module and ndarray.h  header with these interface 
> definitions and methods. 2) Add array interface attributes to array 
> module
> 3) Flesh out locked buffer API
> 4) Bytes object (with Pickling support)
> 5) Fix current buffer object.
>

I agree as well (I think).  Just to be sure I'll restate. These issues 
are all important, and the the discussion has been very useful to flesh 
out the proposed array protocol. Nevertheless, I'd put the priority of 
getting these into Python, or accepted by the Python Dev community 
lower than actually implementing Numeric3 (aka scipy.base) to the point 
that it acceptable to both Numeric and numarray communities. True, 
subsequent changes forced by the acceptance process may require 
reworking in scipy.base, but I put unification far ahead of getting 
these various components finished and into Python. I think that's what 
Travis is getting at too.

I've been tied up in other things, but frankly, I haven't seen that 
much that I have objected to so far in the array protocol discussions 
to warrant comments from me. I think it has been pretty well done (and 
I'm about to leave town so I'm going to be out of touch for a week or 
so, at least mostly)

Perry


From xscottg at yahoo.com  Fri Apr  8 14:43:02 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 14:43:02 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050408214214.45907.qmail@web50206.mail.yahoo.com>

--- Andrew Straw <strawman at astraw.com> wrote:
> >
> > This behavior is explained by Tim Peters:
> >
> >
http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a
> >
> I feared it was something like that. (No platform independent way to 
> represent special values like nan, inf, and so on.)  So I think if we're 
> going to require an encoding character such as '<' or '>' we should also 
> include one that means native which CAN handle these special values... 
> And document why it's needed and why it may get one into trouble.
> 

The data is either big endian or little endian (or possibly a single byte
in which case it doesn't matter).  Whether or not the (hardware, operating
system, C runtime library, C compiler, or Python implementation) can handle
NaNs or Infs is not a property of the data.

What does an additional code or two get you?  Let's say we used ']' for big
endian native, and '[' for little endian native?  Does that just indicate
the possible presence of NaNs for Infs in the data?

Adding those codes doesn't have any affect on whether or not libraries can
deal with them.  I guess I'm not understanding something.


Cheers,
    -Scott


From cookedm at physics.mcmaster.ca  Fri Apr  8 14:52:02 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Fri Apr  8 14:52:02 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
In-Reply-To: <4256EF45.6070004@ee.byu.edu> (Travis Oliphant's message of
 "Fri, 08 Apr 2005 14:53:25 -0600")
References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
	<4256EF45.6070004@ee.byu.edu>
Message-ID: <qnkbr8pey43.fsf@arbutus.physics.mcmaster.ca>

Travis Oliphant <oliphant at ee.byu.edu> writes:

> David M. Cooke wrote:
>
>>It seems that people are worried about speed of the attribute-based
>>array interface when using small arrays in C.
>>
>>
> I think we are talking about here an *array protocol*   (i.e. like the
> buffer protocol and sequence
> protocol).
>
> So far we have just described the Python level interface.  I would
> like to see an array protocol added (perhaps to the buffer protocol
> table).  This could be done just as David describes --- we don't even
> need to use the C-pointer (just return a void *pointer which has a
> version as the first entry).

The purpose of the CObject was to make it possible to pass it through
Python (through the attribute access).

> I think this is how the C-level should be handled, I think.  Yes, it
> does not require changes to Python to implement the __array_c__
> attribute.  But, ultimately, it would be better if we used the C-level
> protocol concept that Python already uses for other objects.

Ah, ok, so you'd have a slot in the type object (like the
number, sequence, or buffer protocols), with the appropriate (C-level)
functions. This would require it to be in the Python core, though, and
would only work for a new version of Python. Alternatively, you have a
special attribute/method that returns an object with the right C API
-- much like CObjects are used for wrapping Numeric's C API.

I would really like to see something working at the C level (so you're
not passing dimensions back-and-forth as Python tuples with Python
ints), but the Python-level array interface you've proposed will work
for now.

This should be revisited once people are using the new array
interface, and we have an idea of how it's being used, and the
performance costs.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From xscottg at yahoo.com  Fri Apr  8 16:06:02 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 16:06:02 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050408230455.35465.qmail@web50209.mail.yahoo.com>

--- Scott Gilbert <xscottg at yahoo.com> wrote:
> 
> --- Andrew Straw <strawman at astraw.com> wrote:
> > 
> > I feared it was something like that. (No platform independent way to 
> > represent special values like nan, inf, and so on.)  So I think if
> > we're going to require an encoding character such as '<' or '>' we
> > should also include one that means native which CAN handle these
> > special values... And document why it's needed and why it may get one
> > into trouble.
> > 
> 
> Let's say we used ']' for big endian native, and '[' for little endian
> native?  Does that just indicate the possible presence of NaNs for Infs
> in the data?
> 
> Adding those codes doesn't have any affect on whether or not libraries
> can deal with them.  I guess I'm not understanding something.
> 

I think I'm understanding my problem in understanding  :-).  There IS a
platform independant way to represent NaNs and Infs.  It's pretty clearly
spelled out in IEEE-754:

    http://stevehollasch.com/cgindex/coding/ieeefloat.html

I think something we've been assuming is that the array data is basically
IEEE-754 compliant (maybe it needs to be byteswapped).  If that's not true,
then we're going to need some new typecodes.  We're not supporting the
ability to pass VAX floating point around (Are we????).

The problem is that you can't make any safe assumptions about whether your
current platform will deal with IEEE-754 data in any predictable way if it
contains NaNs or Infs.  So additional typecodes won't really solve
anything.


Tim Peter's explanation is a good representation of Python's official
position regarding floating point issues, but a much simpler explanation is
possible...  

The struct module in "standard mode" decodes the data one character at a
time and builds a float from them.  You can see this in the
_PyFloat_Unpack8 function in the floatobject.c file.  In other words, this
routine probably works on a VAX too (taking a IEEE-754 double and building
a VAX floating point as it goes).  You can also see the comment in there
that says it doesn't handle NaNs or Infs.


I don't think we need another indicator for '>' big-endian or '<' for
little-endian.


Cheers,
    -Scott


From konrad.hinsen at laposte.net  Fri Apr  8 23:46:00 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Fri Apr  8 23:46:00 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050408230455.35465.qmail@web50209.mail.yahoo.com>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com>
Message-ID: <95b362f578483f1a9ee3e850e108c6d8@laposte.net>

On 09.04.2005, at 01:04, Scott Gilbert wrote:

> I think something we've been assuming is that the array data is  
> basically
> IEEE-754 compliant (maybe it needs to be byteswapped).  If that's not  
> true,
> then we're going to need some new typecodes.  We're not supporting the
> ability to pass VAX floating point around (Are we????).

This discussion has been coming up regularly for a few years. Until now  
the concensus has always been that Python should make no assumptions  
that go beyond what a C compiler can promise. Which means no  
assumptions about floating-point representation.

Of course the computing world is changing, and IEEE format may well be  
ubiquitous by now. Vaxes must be in the museum by now. But how about  
mainframes? IBM mainframes didn't use IEEE when I used them (last time  
15 years ago), and they are still around, possibly compatible to their  
ancestors.

Another detail to consider is that although most machines use the IEEE  
representation, hardly any respects the IEEE rules for floating point  
operations in all detail. In particular, trusting that Inf and NaN will  
be treated as IEEE postulates is a risky business.

Konrad.
--
------------------------------------------------------------------------ 
-------
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
------------------------------------------------------------------------ 
-------


From xscottg at yahoo.com  Sat Apr  9 09:36:05 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Sat Apr  9 09:36:05 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050409163525.93733.qmail@web50201.mail.yahoo.com>

--- konrad.hinsen at laposte.net wrote:
> 
> This discussion has been coming up regularly for a few years. Until now  
> the concensus has always been that Python should make no assumptions  
> that go beyond what a C compiler can promise. Which means no  
> assumptions about floating-point representation.
> 
> Of course the computing world is changing, and IEEE format may well be  
> ubiquitous by now. Vaxes must be in the museum by now. But how about  
> mainframes? IBM mainframes didn't use IEEE when I used them (last time  
> 15 years ago), and they are still around, possibly compatible to their  
> ancestors.
> 

I've been following this mailing list for a few years now, but I skip a lot
of threads.  I almost certainly skipped this topic in the past since it
wasn't relevant to me.  I'm only interested in it now since it's relevant
to this data interchange business, so I'm sorry if this is a rehash...

Trying to stay portable is a good goal, and I can understand why Python
proper would try to adhere to the restrictions it does.  Despite the claim,
Python makes plenty of assumptions that a standards conformant C compiler
could break.  If numpy doesn't make some assumptions about floating point
representation, it's going to kill the possibity of passing data across
machines, and that's pretty unacceptable.

I'm not comfortable saying "ubiquitous" since I don't know what the
mainframe or super computing community is making use of, and I don't know
what sort of little machines Python is running on.  The closest thing to a
mainframe that I've ever used was a Convex, and I never knew what it's
floating point representation was.  However, I know that x86, PPC, AMD-64,
IA64, Alpha, Sparc, and whatever HPUX and SGIs are running on all use
IEEE-754 format.  That's probably 99.999% of all machines capable of
running Python, and at least that percentage of users.

It would be a shame to gum up this typecode thing for situations that don't
occur in practice.  If it has to be done, then I recommend we use the '@'
code in place of the '<' or '>' for platforms that are out of the ordinary.
 It's important to specify that '@' is only to be used on floating point
data that is not IEEE-754.  In this case it doesn't mean "native" like it
does in the struct module, it means "weird" :-).


>
> Another detail to consider is that although most machines use the IEEE  
> representation, hardly any respects the IEEE rules for floating point  
> operations in all detail. In particular, trusting that Inf and NaN will  
> be treated as IEEE postulates is a risky business.
> 

See that's the thing.  Why burden how you label the data with the
restrictions of the current machine?  You can take the data off the
machine.  Whether or not I can rely on what NaN*Inf will give me, I know
that I can take NaN and Inf to another machine and get the same
interpretation of the data.

This whole thread started because Andrew Straw showed that
struct.pack('<d',nan) causes an exception, but that's just a limitation in
the struct module.  He was definitely running it on a machine that was
capable of representing an 8 byte little-endian NaN.  He doesn't need a new
typecode until he tries to transport data from some esoteric mainframe.


Cheers,
    -Scott


From oliphant at ee.byu.edu  Sat Apr  9 09:55:04 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sat Apr  9 09:55:04 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <95b362f578483f1a9ee3e850e108c6d8@laposte.net>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net>
Message-ID: <425808B4.8070005@ee.byu.edu>

konrad.hinsen at laposte.net wrote:

> On 09.04.2005, at 01:04, Scott Gilbert wrote:
>
>> I think something we've been assuming is that the array data is  
>> basically
>> IEEE-754 compliant (maybe it needs to be byteswapped).  If that's 
>> not  true,
>> then we're going to need some new typecodes.  We're not supporting the
>> ability to pass VAX floating point around (Are we????).
>

No, in moving from the struct modules character codes we are trying to 
do something more platform independent because it is very likely that 
different platforms will want to exchange binary data.   IEEE-754 is a 
great standard to build an interface around.   Data sharing was the 
whole reason the standard emerged and a lot of companies got on board.

>
> This discussion has been coming up regularly for a few years. Until 
> now  the concensus has always been that Python should make no 
> assumptions  that go beyond what a C compiler can promise. Which means 
> no  assumptions about floating-point representation.
>
> Of course the computing world is changing, and IEEE format may well 
> be  ubiquitous by now. Vaxes must be in the museum by now. But how 
> about  mainframes? IBM mainframes didn't use IEEE when I used them 
> (last time  15 years ago), and they are still around, possibly 
> compatible to their  ancestors.

I found the following piece, written about 6 years ago interesting:

http://www.research.ibm.com/journal/rd/435/schwarz.html

Basically, it states that chips in newer IBM mainframes support the IEEE 
754 standard.

>
> Another detail to consider is that although most machines use the 
> IEEE  representation, hardly any respects the IEEE rules for floating 
> point  operations in all detail. In particular, trusting that Inf and 
> NaN will  be treated as IEEE postulates is a risky business.

But, this can be handled with platform-dependendent C-code when and if 
problems arise.  

-Travis


From strawman at astraw.com  Sat Apr  9 12:36:03 2005
From: strawman at astraw.com (Andrew Straw)
Date: Sat Apr  9 12:36:03 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <425808B4.8070005@ee.byu.edu>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu>
Message-ID: <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com>

Here's an email Todd Miller sent me (I hoped he'd send it directly to 
the list, but I'll forward it.  Todd, I hope you don't mind.)

Todd Miller wrote:

> On Fri, 2005-04-08 at 15:46 -0700, Andrew Straw wrote:
>> Hi Todd,
>>
>> Could you join in on this thread?  I think you wrote the ieeespecial
>> stuff in numarray, so it's clear you have a much better understanding 
>> of
>> the issues than I do...
>>
>> Cheers!
>> Andrew
>
> My own understanding is limited,  but I can say a few things that might
> make the status of numarray clearer.  My assumptions for numarray were
> that:
>
> 1. Floating point values are 32-bit or 64-bit entities which are stored
> in IEEE-754 format.  This is a basic assumption of numarray.ieeespecial
> so I expect it simply won't work on a VAX.  There's no checking for
> this.
>
> 2. The platforms that I care about,  AMD/Intel Windows/Linux, PowerPC
> OS-X, and Ultra-SPARC Solaris,  all seem to provide IEEE-754 floating
> point.  ieeespecial has been tested to work there.
>
> 3. I viewed IEEE-754 floating point numbers as 32-bit or 64-bit 
> unsigned
> integers,  and contiguous ranges on those integers are used to 
> represent
> special values like NAN and INF.  Platform byte ordering for the
> IEEE-754 floating point numbers mirrors byte ordering for integers so
> the ieeespecial NAN detection code works in a cross platform way *and*
> values exported from one IEEE-754 platform will work with ieeespecial
> when imported on another.  It's important to note that special values
> are not unique:  there is no single NAN value;  it's a bit range.
>
> 4. numarray leaks IEEE-754 special values out into Python floating 
> point
> scalars.  This may be bad form.  I do this because (1) they repr
> understandably if not in a platform independent way and (2) people need
> to get at them.  I noticed recently that ieeespecial.nan ==
> ieeespecial.nan returns incorrect answers (True!) for Python-2.3 and
> correct ones (False) for Python-2.4.  I haven't looked at what the 
> array
> version does yet:  array(nan) == array(nan).  The point to be taken 
> from
> this is that the level at which numarray ieee special value handling
> works or doesn't work is really restricted to (1) detecting certain
> ieee-754 bit ranges (2) the basic behavior of C code for C89 complilers
> for array code (no guarantees) (3) the behavior of Python itself
> (improving).
>
> In the context of the array protocol (looking very nice by the way) my
> thinking is that non-IEEE-754 floating point could be described with 
> bit
> fields and that the current type codes should mean IEEE-754.
>
> Some minor things I noticed in the array interface:
>
> 1. The packing order of bit fields is not clear.  In C,  my experience
> is that some compilers pack bit structs towards the higher order bits 
> of
> an integer,  and some towards the lower.  More info to clarify that
> would be helpful.
>
> 2.  I saw no mention that we're talking about a protocol.  I'm sure
> that's clear to everyone following this discussion closely,  but I
> didn't see it in the spec.  It might make sense to allude to the C
> helper functions and potential for additions to the Python type struct
> even if they're not spelled out.
>
> Regards,
> Todd


On Apr 9, 2005, at 9:54 AM, Travis Oliphant wrote:

> konrad.hinsen at laposte.net wrote:
>
>> On 09.04.2005, at 01:04, Scott Gilbert wrote:
>>
>>> I think something we've been assuming is that the array data is  
>>> basically
>>> IEEE-754 compliant (maybe it needs to be byteswapped).  If that's 
>>> not  true,
>>> then we're going to need some new typecodes.  We're not supporting 
>>> the
>>> ability to pass VAX floating point around (Are we????).
>>
>
> No, in moving from the struct modules character codes we are trying to 
> do something more platform independent because it is very likely that 
> different platforms will want to exchange binary data.   IEEE-754 is a 
> great standard to build an interface around.   Data sharing was the 
> whole reason the standard emerged and a lot of companies got on board.
>
>>
>> This discussion has been coming up regularly for a few years. Until 
>> now  the concensus has always been that Python should make no 
>> assumptions  that go beyond what a C compiler can promise. Which 
>> means no  assumptions about floating-point representation.
>>
>> Of course the computing world is changing, and IEEE format may well 
>> be  ubiquitous by now. Vaxes must be in the museum by now. But how 
>> about  mainframes? IBM mainframes didn't use IEEE when I used them 
>> (last time  15 years ago), and they are still around, possibly 
>> compatible to their  ancestors.
>
> I found the following piece, written about 6 years ago interesting:
>
> http://www.research.ibm.com/journal/rd/435/schwarz.html
>
> Basically, it states that chips in newer IBM mainframes support the 
> IEEE 754 standard.
>
>>
>> Another detail to consider is that although most machines use the 
>> IEEE  representation, hardly any respects the IEEE rules for floating 
>> point  operations in all detail. In particular, trusting that Inf and 
>> NaN will  be treated as IEEE postulates is a risky business.
>
> But, this can be handled with platform-dependendent C-code when and if 
> problems arise.
> -Travis
>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real 
> users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion


From jmiller at stsci.edu  Sat Apr  9 16:18:00 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Sat Apr  9 16:18:00 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com>
	 <95b362f578483f1a9ee3e850e108c6d8@laposte.net>
	 <425808B4.8070005@ee.byu.edu> <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com>
Message-ID: <1113088643.5363.8.camel@jaytmiller.comcast.net>

On Sat, 2005-04-09 at 12:35 -0700, Andrew Straw wrote:
> Here's an email Todd Miller sent me (I hoped he'd send it directly to 
> the list, but I'll forward it.  Todd, I hope you don't mind.)

No, I don't mind.  I intended to send it to the list but left in a rush
this morning.  

Todd

> 
> > On Fri, 2005-04-08 at 15:46 -0700, Andrew Straw wrote:
> >> Hi Todd,
> >>
> >> Could you join in on this thread?  I think you wrote the ieeespecial
> >> stuff in numarray, so it's clear you have a much better understanding 
> >> of
> >> the issues than I do...
> >>
> >> Cheers!
> >> Andrew
> >
> > My own understanding is limited,  but I can say a few things that might
> > make the status of numarray clearer.  My assumptions for numarray were
> > that:
> >
> > 1. Floating point values are 32-bit or 64-bit entities which are stored
> > in IEEE-754 format.  This is a basic assumption of numarray.ieeespecial
> > so I expect it simply won't work on a VAX.  There's no checking for
> > this.
> >
> > 2. The platforms that I care about,  AMD/Intel Windows/Linux, PowerPC
> > OS-X, and Ultra-SPARC Solaris,  all seem to provide IEEE-754 floating
> > point.  ieeespecial has been tested to work there.
> >
> > 3. I viewed IEEE-754 floating point numbers as 32-bit or 64-bit 
> > unsigned
> > integers,  and contiguous ranges on those integers are used to 
> > represent
> > special values like NAN and INF.  Platform byte ordering for the
> > IEEE-754 floating point numbers mirrors byte ordering for integers so
> > the ieeespecial NAN detection code works in a cross platform way *and*
> > values exported from one IEEE-754 platform will work with ieeespecial
> > when imported on another.  It's important to note that special values
> > are not unique:  there is no single NAN value;  it's a bit range.
> >
> > 4. numarray leaks IEEE-754 special values out into Python floating 
> > point
> > scalars.  This may be bad form.  I do this because (1) they repr
> > understandably if not in a platform independent way and (2) people need
> > to get at them.  I noticed recently that ieeespecial.nan ==
> > ieeespecial.nan returns incorrect answers (True!) for Python-2.3 and
> > correct ones (False) for Python-2.4.  I haven't looked at what the 
> > array
> > version does yet:  array(nan) == array(nan).  The point to be taken 
> > from
> > this is that the level at which numarray ieee special value handling
> > works or doesn't work is really restricted to (1) detecting certain
> > ieee-754 bit ranges (2) the basic behavior of C code for C89 complilers
> > for array code (no guarantees) (3) the behavior of Python itself
> > (improving).
> >
> > In the context of the array protocol (looking very nice by the way) my
> > thinking is that non-IEEE-754 floating point could be described with 
> > bit
> > fields and that the current type codes should mean IEEE-754.
> >
> > Some minor things I noticed in the array interface:
> >
> > 1. The packing order of bit fields is not clear.  In C,  my experience
> > is that some compilers pack bit structs towards the higher order bits 
> > of
> > an integer,  and some towards the lower.  More info to clarify that
> > would be helpful.
> >
> > 2.  I saw no mention that we're talking about a protocol.  I'm sure
> > that's clear to everyone following this discussion closely,  but I
> > didn't see it in the spec.  It might make sense to allude to the C
> > helper functions and potential for additions to the Python type struct
> > even if they're not spelled out.
> >
> > Regards,
> > Todd
> 
> 
> On Apr 9, 2005, at 9:54 AM, Travis Oliphant wrote:
> 
> > konrad.hinsen at laposte.net wrote:
> >
> >> On 09.04.2005, at 01:04, Scott Gilbert wrote:
> >>
> >>> I think something we've been assuming is that the array data is  
> >>> basically
> >>> IEEE-754 compliant (maybe it needs to be byteswapped).  If that's 
> >>> not  true,
> >>> then we're going to need some new typecodes.  We're not supporting 
> >>> the
> >>> ability to pass VAX floating point around (Are we????).
> >>
> >
> > No, in moving from the struct modules character codes we are trying to 
> > do something more platform independent because it is very likely that 
> > different platforms will want to exchange binary data.   IEEE-754 is a 
> > great standard to build an interface around.   Data sharing was the 
> > whole reason the standard emerged and a lot of companies got on board.
> >
> >>
> >> This discussion has been coming up regularly for a few years. Until 
> >> now  the concensus has always been that Python should make no 
> >> assumptions  that go beyond what a C compiler can promise. Which 
> >> means no  assumptions about floating-point representation.
> >>
> >> Of course the computing world is changing, and IEEE format may well 
> >> be  ubiquitous by now. Vaxes must be in the museum by now. But how 
> >> about  mainframes? IBM mainframes didn't use IEEE when I used them 
> >> (last time  15 years ago), and they are still around, possibly 
> >> compatible to their  ancestors.
> >
> > I found the following piece, written about 6 years ago interesting:
> >
> > http://www.research.ibm.com/journal/rd/435/schwarz.html
> >
> > Basically, it states that chips in newer IBM mainframes support the 
> > IEEE 754 standard.
> >
> >>
> >> Another detail to consider is that although most machines use the 
> >> IEEE  representation, hardly any respects the IEEE rules for floating 
> >> point  operations in all detail. In particular, trusting that Inf and 
> >> NaN will  be treated as IEEE postulates is a risky business.
> >
> > But, this can be handled with platform-dependendent C-code when and if 
> > problems arise.
> > -Travis
> >
> >
> >
> >
> > -------------------------------------------------------
> > SF email is sponsored by - The IT Product Guide
> > Read honest & candid reviews on hundreds of IT Products from real 
> > users.
> > Discover which products truly live up to the hype. Start reading now.
> > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion


From tchur at optushome.com.au  Sat Apr  9 17:25:43 2005
From: tchur at optushome.com.au (Tim Churches)
Date: Sat Apr  9 17:25:43 2005
Subject: [Numpy-discussion] Silent overflow of Int32 array
Message-ID: <4258721E.1080905@optushome.com.au>

I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit 
Linux):

 >>> import Numeric as N
 >>> a = N.array((2000000000,1000000000),typecode=N.Int32)
 >>> N.add.reduce(a)
-1294967296

OK, it is an elementary mistake, but the silent overflow caught me 
unawares. casting the array to Float64 before summing it avoids the  
error, but in my instance the actual data is a rank-1 array of 21 
million integers with a mean value of about 140 (which adds up more than 
sys.maxint), and casting to Float64 will use quite a lot of memory (as 
well as taking some time).

Any advice for catching or avoiding such overflow without always 
incurring a performance and memory hit by always casting to Float64? 
Shouldn't add.reduce() be checking for overflow and raising an error? 
Then it would be easy to upcast only when overflow (or underflow) 
occurs, rather than always.

Tim C


From jmiller at stsci.edu  Sun Apr 10 07:25:08 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Sun Apr 10 07:25:08 2005
Subject: [Numpy-discussion] Silent overflow of Int32 array
In-Reply-To: <4258721E.1080905@optushome.com.au>
References: <4258721E.1080905@optushome.com.au>
Message-ID: <1113143026.5359.35.camel@jaytmiller.comcast.net>

On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote:
> I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit 
> Linux):
> 
>  >>> import Numeric as N
>  >>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>  >>> N.add.reduce(a)
> -1294967296
> 
> OK, it is an elementary mistake, but the silent overflow caught me 
> unawares. casting the array to Float64 before summing it avoids the  
> error, but in my instance the actual data is a rank-1 array of 21 
> million integers with a mean value of about 140 (which adds up more than 
> sys.maxint), and casting to Float64 will use quite a lot of memory (as 
> well as taking some time).
> 
> Any advice for catching or avoiding such overflow without always 
> incurring a performance and memory hit by always casting to Float64? 

Here's what numarray does:

>>> import numarray as N
>>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>>> N.add.reduce(a)
-1294967296

So basic reductions in numarray have the same "careful while you're
shaving" behavior as Numeric;  it's fast but easy to screw up.

But:

>>> a.sum()
3000000000L
>>> a.sum(type='d')
3000000000.0

a.sum() blockwise upcasts to the largest type of kind on the fly, in
this case, Int64.   This avoids the storage overhead of typecasting the
entire array. 

A better name for the method would have been sumall() since it sums all
elements of a multi-dimensional array.  The flattening process reduces
on one dimension before flattening preventing a full copy of a
discontiguous array.  It could be smarter about choosing the dimension
of the initial reduction.

Regards,
Todd


From pearu at cens.ioc.ee  Mon Apr 11 00:59:14 2005
From: pearu at cens.ioc.ee (Pearu Peterson)
Date: Mon Apr 11 00:59:14 2005
Subject: [Numpy-discussion] scipy.base
Message-ID: <Pine.LNX.4.21.0504110939080.26915-100000@cens.kybi>

Hi Travis,

I have committed scipy.{distutils,base} to Numeric3 CVS repository. 
scipy.distutils is a reviewed version of scipy_distutils and
as one of its new features there is Configuration class that allows
one to write much simpler setup.py files for subpackages. See setup.py
files under Numeric3/scipy directory for examples. scipy.base is a
very minimal copy of scipy_base plus ndarray modules.

When using setup_scipy.py for building, the ndarray package is installed
as scipy.base and

  from scipy.base import *

should work equivalently to

  from ndarray import *

for instance.

I have used information from Numeric3/setup.py to implement
Numeric3/scipy/base/setup.py and it should be updated whenever
Numeric3/setup.py is changed. 
However, I would recommend start using scipy.base instead of ndarray as
using both may cause unexpected behaviour when installed ndarray is older
than scipy.base installation (see [*]). In Numeric3 CVS repository that
would mean replacing setup.py with setup_scipy.py and any modification to
ndarray setup scripts should be done in scipy/base/setup.py. We can apply
this step whenever you feel confident with new setup.py files. Let me know
if you have any troubles with them.

To clean up Numeric3 CVS repository completely then Include, Src, Lib,
CodeGenerators directories should be moved under the scipy/base directory.
However, this step can be omitted if you would prefer working with files
at the top directory of Numeric3. Current setup.py scripts fully
support this approach as well.

There are also few open issues and questions. 

First, how to name Numeric3 project when it installs scipy.base,
scipy.distutils, Numeric packages, etc? This name will be used when
creating source distributions and also as part of the path where header
files will be installed. At the moment setup_scipy.py uses the name
'ndarray'. And so `setup_scipy.py sdist`, for example, produces
ndarray-30.0.tar.gz file; `setup_scipy.py install` installs header files
under <prefix>/include/ndarray/ directory. Though this is fine with me, I
am not sure that this is an ideal situation. I think we should choose the
name now and stick to it forever, especially since 3rd party extension
modules need to know where to look for ndarray header files. This name
cannot be 'numarray', obviously, but there are options like 'ndarray',
'numpy', and may be others. 
In fact, 'Numeric' (with version 3x.x) would be also an option but that
would be certainly cause some problems when one wants both Numeric 2x.x
and Numeric 3x.x to be installed in the system, the header files would end
up in the same directory, for instance. As a workaround, we could force
installing Numeric3 header files to <prefix>/include/Numeric/3/ or
something. I acctually like this idea but I wonder what other think about
this.

Second, is it already possible to use ndarray C/API as a replacement of
Numeric C/API, i.e. would simple replacement of 

  #include "Numeric/arrayobject.h"

with 

  #include "ndarray/arrayobject.h"

work? And if not, will it ever be? This would be interesting to know as an
extension writer.

[*] Due to keeping changes to Numeric3 sources minimal, scipy.base
multiarray and umath modules first try to import ndarray and then
scipy.base whenever ndarray is missing. One should remove ndarray
installation from the system before using scipy.base.

Regards,
Pearu


From konrad.hinsen at laposte.net  Mon Apr 11 02:30:28 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Mon Apr 11 02:30:28 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <425808B4.8070005@ee.byu.edu>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu>
Message-ID: <f48ec8afce5fef1bc1268696e9ea21c8@laposte.net>

On Apr 9, 2005, at 18:54, Travis Oliphant wrote:

> No, in moving from the struct modules character codes we are trying to 
> do something more platform independent because it is very likely that 
> different platforms will want to exchange binary data.   IEEE-754 is a 
> great standard to build

For data exchange between platforms, i.e. through files and network 
connections, XDR is arguably a better choice. It actually uses IEEE for 
floats, but XDR libraries provide conversion code for other platforms. 
It also takes care of byte ordering.

>  an interface around.   Data sharing was the whole reason the standard 
> emerged and a lot of companies got on board.

I think the main reason was standardization of precision, range, and 
operations, to make floating-point code more portable. This has had 
moderate success, as 100% IEEE platforms are rare if they exist at all.

>> Another detail to consider is that although most machines use the 
>> IEEE  representation, hardly any respects the IEEE rules for floating 
>> point  operations in all detail. In particular, trusting that Inf and 
>> NaN will  be treated as IEEE postulates is a risky business.
>
> But, this can be handled with platform-dependendent C-code when and if 
> problems arise.

Can it? I have faint memories about Tim Peters explaining why and how 
handling IEEE in C code is a pain. Anyway, it would be a good idea to 
get his opinion on whatever proposal about IEEE before implementing it.

Konrad.


From tchur at optushome.com.au  Mon Apr 11 13:52:19 2005
From: tchur at optushome.com.au (Tim Churches)
Date: Mon Apr 11 13:52:19 2005
Subject: [Numpy-discussion] Silent overflow of Int32 array
In-Reply-To: <1113143026.5359.35.camel@jaytmiller.comcast.net>
References: <4258721E.1080905@optushome.com.au> <1113143026.5359.35.camel@jaytmiller.comcast.net>
Message-ID: <425AE33C.30403@optushome.com.au>

Todd Miller wrote:
> On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote:
> 
>>I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit 
>>Linux):
>>
>> >>> import Numeric as N
>> >>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>> >>> N.add.reduce(a)
>>-1294967296
>>
>>OK, it is an elementary mistake, but the silent overflow caught me 
>>unawares. casting the array to Float64 before summing it avoids the  
>>error, but in my instance the actual data is a rank-1 array of 21 
>>million integers with a mean value of about 140 (which adds up more than 
>>sys.maxint), and casting to Float64 will use quite a lot of memory (as 
>>well as taking some time).
>>
>>Any advice for catching or avoiding such overflow without always 
>>incurring a performance and memory hit by always casting to Float64? 
> 
> 
> Here's what numarray does:
> 
> 
>>>>import numarray as N
>>>>a = N.array((2000000000,1000000000),typecode=N.Int32)
>>>>N.add.reduce(a)
> 
> -1294967296
> 
> So basic reductions in numarray have the same "careful while you're
> shaving" behavior as Numeric;  it's fast but easy to screw up.

Sure, but how does one be careful? It seems that for any array of two
integers or more which could sum to more than sys.maxint or less than
-sys.maxint, add.reduce() in both NumPy and Numeric will give either a)
the correct answer or b) the incorrect answer, and short of adding up
the array using a safer but much slower method, there is no way of
determining if the answer provided (quickly) by add.reduce is right or
wrong? Which seems to make it fast but useless (for integer arrays, at
least? Is that an unfair summary? Can anyone point me towards a method
for using add.reduce() on small arrays of large integers with values in
the billions, or on large arrays of fairly small integer values, which
will not suddenly and without warning give the wrong answer?

> 
> But:
> 
> 
>>>>a.sum()
> 
> 3000000000L
> 
>>>>a.sum(type='d')
> 
> 3000000000.0
> 
> a.sum() blockwise upcasts to the largest type of kind on the fly, in
> this case, Int64.   This avoids the storage overhead of typecasting the
> entire array. 

That's on a 64-bit platform, right? The same method could be used to
cast the accumulator to a Float64 on a 32-bit platform to avoid casting
the entire array?

> A better name for the method would have been sumall() since it sums all
> elements of a multi-dimensional array.  The flattening process reduces
> on one dimension before flattening preventing a full copy of a
> discontiguous array.  It could be smarter about choosing the dimension
> of the initial reduction.

OK, thanks. Unfortunately it is not possible for us to port our
application to numarray at the moment. But the insight is most helpful.

Tim C


From oliphant at ee.byu.edu  Mon Apr 11 17:12:25 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr 11 17:12:25 2005
Subject: [Numpy-discussion] scipy.base
In-Reply-To: <Pine.LNX.4.21.0504110939080.26915-100000@cens.kybi>
References: <Pine.LNX.4.21.0504110939080.26915-100000@cens.kybi>
Message-ID: <425B1182.7060102@ee.byu.edu>

Pearu Peterson wrote:

>Hi Travis,
>
>I have committed scipy.{distutils,base} to Numeric3 CVS repository. 
>scipy.distutils is a reviewed version of scipy_distutils and
>as one of its new features there is Configuration class that allows
>one to write much simpler setup.py files for subpackages. See setup.py
>files under Numeric3/scipy directory for examples. scipy.base is a
>very minimal copy of scipy_base plus ndarray modules.
>  
>
Thank you, thank you for your help with this.

>When using setup_scipy.py for building, the ndarray package is installed
>as scipy.base and
>
>  from scipy.base import *
>
>should work equivalently to
>
>  from ndarray import *
>
>for instance.
>  
>
I don't like from ndarray import *.   It's only been a place-holder.  
Let's get rid of it as soon as possible.

>To clean up Numeric3 CVS repository completely then Include, Src, Lib,
>CodeGenerators directories should be moved under the scipy/base directory.
>However, this step can be omitted if you would prefer working with files
>at the top directory of Numeric3. 
>
I have no preference here.   Whatever works best.

>First, how to name Numeric3 project when it installs scipy.base,
>scipy.distutils, Numeric packages, etc? This name will be used when
>creating source distributions and also as part of the path where header
>files will be installed. At the moment setup_scipy.py uses the name
>'ndarray'. 
>
I don't like the name ndarray -- it's too limiting.  Why not scipy_core? 

>In fact, 'Numeric' (with version 3x.x) would be also an option but that
>would be certainly cause some problems when one wants both Numeric 2x.x
>and Numeric 3x.x to be installed in the system, the header files would end
>up in the same directory, for instance. As a workaround, we could force
>installing Numeric3 header files to <prefix>/include/Numeric/3/ or
>something. I acctually like this idea but I wonder what other think about
>this.
>  
>
How about include/scipy?

>Second, is it already possible to use ndarray C/API as a replacement of
>Numeric C/API, i.e. would simple replacement of 
>
>  #include "Numeric/arrayobject.h"
>
>with 
>
>  #include "ndarray/arrayobject.h"
>
>work? And if not, will it ever be? This would be interesting to know as an
>extension writer.
>  
>
This should work fine.   All of the old C-API is there (there are some 
new calls, but the old ones should still work).   The only issue is that 
one of the calls (PyArray_Take I think now uses a standardized 
PyArrayObject * as one of it's arguments instead of a PyObject *).  This 
shouldn't be a problem, since you always had to call it with an array.  
It's just now more explicit, but could lead to a warning.

>[*] Due to keeping changes to Numeric3 sources minimal, scipy.base
>multiarray and umath modules first try to import ndarray and then
>scipy.base whenever ndarray is missing. One should remove ndarray
>installation from the system before using scipy.base.
>  
>
I don't mind changing the package names entirely at this point. 

-Travis


From oliphant at ee.byu.edu  Tue Apr 12 16:39:23 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Tue Apr 12 16:39:23 2005
Subject: [Numpy-discussion] Subclassing and metadata
Message-ID: <425C5BDF.1010802@ee.byu.edu>

I think I've found a possible solution for subclasses that want to 
handle metadata.

Essentially, any subclass that defines the method _update_meta(self, other)
will get that method called when an array is sliced, or subscripted.

Anytime an array is created where a subtype is the caller, this method 
will be called if it is available.


Here is a simple example:

import ndarray

class subclass(ndarray.ndarray):
    def __new__(self, shape, *args, **kwds):
        self = ndarray.ndarray.__new__(subclass, shape, 'V4')
        return self

    def __init__(self, shape, *args, **kwds):
        self.dict = kwds
        return
   
    def _update_meta(self, obj):
        self.dict = obj.dict


Comments?

-Travis


From pearu at cens.ioc.ee  Wed Apr 13 04:06:00 2005
From: pearu at cens.ioc.ee (pearu at cens.ioc.ee)
Date: Wed Apr 13 04:06:00 2005
Subject: [Numpy-discussion] scipy.base
In-Reply-To: <425B1182.7060102@ee.byu.edu>
Message-ID: <Pine.LNX.4.21.0504131354220.3472-100000@cens.kybi>


On Mon, 11 Apr 2005, Travis Oliphant wrote:

> >When using setup_scipy.py for building, the ndarray package is installed
> >as scipy.base and
> >
> >  from scipy.base import *
> >
> >should work equivalently to
> >
> >  from ndarray import *
> >
> >for instance.
> >  
> >
> I don't like from ndarray import *.   It's only been a place-holder.  
> Let's get rid of it as soon as possible.

Done in CVS.

> >To clean up Numeric3 CVS repository completely then Include, Src, Lib,
> >CodeGenerators directories should be moved under the scipy/base directory.
> >However, this step can be omitted if you would prefer working with files
> >at the top directory of Numeric3. 
> >
> I have no preference here.   Whatever works best.

Directory Include/ndarray/ is now moved to scipy/base/Include/scipy/base/.
I'l move other directories as well.

> >First, how to name Numeric3 project when it installs scipy.base,
> >scipy.distutils, Numeric packages, etc? This name will be used when
> >creating source distributions and also as part of the path where header
> >files will be installed. At the moment setup_scipy.py uses the name
> >'ndarray'. 
> >
> I don't like the name ndarray -- it's too limiting.  Why not scipy_core? 
> 
> >In fact, 'Numeric' (with version 3x.x) would be also an option but that
> >would be certainly cause some problems when one wants both Numeric 2x.x
> >and Numeric 3x.x to be installed in the system, the header files would end
> >up in the same directory, for instance. As a workaround, we could force
> >installing Numeric3 header files to <prefix>/include/Numeric/3/ or
> >something. I acctually like this idea but I wonder what other think about
> >this.
> >  
> >
> How about include/scipy?

Without going into details of distutils restrictions for various options,
I found that

  #include "scipy/base/arrayobject.h"

option works best. And the name of the Numeric3 package is now scipy_core.
All this is implemented in Numeric3 CVS now.

> >Second, is it already possible to use ndarray C/API as a replacement of
> >Numeric C/API, i.e. would simple replacement of 
> >
> >  #include "Numeric/arrayobject.h"
> >
> >with 
> >
> >  #include "ndarray/arrayobject.h"
> >
> >work? And if not, will it ever be? This would be interesting to know as an
> >extension writer.
> >  
> >
> This should work fine.

Great!

Thanks,
Pearu


From alexandre.guimond at mirada-solutions.com  Wed Apr 13 18:10:47 2005
From: alexandre.guimond at mirada-solutions.com (Alexandre Guimond)
Date: Wed Apr 13 18:10:47 2005
Subject: [Numpy-discussion] numarray, nd_image transforms, and multi-channel images
Message-ID: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com>

Hi all.
 
I've been looking at numarray to do some image processing. A lot of the
work I do deal with transforming images, either with affine
transformations, or vector field. Numarray seems somewhat well equiped
to address these issues, but I am concerned about one aspect. It seems
that the transformation code (affine_transforrm and geometric_transform)
computes input coordonates for every output coordinate in the resulting
array. If I have an RGB image for which the transformation is the same
for all 3 RGB channels, I would assume that this will triple the
workload unncessarily. It might have a dramatic effect for the geometric
transformation which will most often be slower then affine. Is there any
way around this, e.g. is it possible to specify numarray to use the same
interpolation coefficients for the last "n" dimention of the array, or
to tell numarray to only compute interpolation coefficients and apply
those seperatly for each channel?
 
thx for any help / info.
 
alex.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20050413/03d8ebb0/attachment.html>

From verveer at embl-heidelberg.de  Thu Apr 14 02:45:45 2005
From: verveer at embl-heidelberg.de (Peter Verveer)
Date: Thu Apr 14 02:45:45 2005
Subject: [Numpy-discussion] numarray, nd_image transforms, and multi-channel images
In-Reply-To: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com>
References: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com>
Message-ID: <14ba52860a6e1f838975c3c04a0dafc9@embl-heidelberg.de>

Hi Alex,

It is correct that there is an amount of work duplicated, if you do an 
identical interpolation operation on several arrays. There is currently 
no way to avoid this. This can be fixed and I will have a look to see 
how easy that is to do. If it is not easy to factor out that part of 
the code, I will most likely not be able to spend the time to do it 
though...

You could at least use the map_coordinates function that will allow you 
to use a pre-calculated coordinate mapping. There will still be 
duplication of work, but al least you avoid the duplication of the 
calculation of the coordinate transformation.

Peter

> Hi all.
> ?
> I've been looking at numarray to do some image processing. A lot of 
> the work I do deal with transforming images, either with affine 
> transformations, or vector field. Numarray seems somewhat well equiped 
> to address these issues, but I am concerned about one aspect. It seems 
> that the transformation code (affine_transforrm and 
> geometric_transform) computes input coordonates for every output 
> coordinate in the resulting array. If I have an RGB image for which 
> the transformation is the same for all 3 RGB channels, I would assume 
> that this will triple the workload unncessarily. It might have a 
> dramatic effect for the geometric transformation which will most often 
> be slower then affine. Is there any way around this, e.g. is it 
> possible to specify numarray to use the same interpolation 
> coefficients for the last "n" dimention of the array, or to tell 
> numarray to only compute interpolation coefficients and apply those 
> seperatly for each channel?
> ?
> thx for any help / info.
> ?
> alex.


From jmiller at stsci.edu  Thu Apr 14 07:47:02 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr 14 07:47:02 2005
Subject: [Numpy-discussion] ANN: numarray-1.3.0
Message-ID: <1113489855.29880.14.camel@halloween.stsci.edu>

Release Notes for numarray-1.3.0

Numarray is an array processing package designed to efficiently
manipulate large multi-dimensional arrays.  Numarray is modelled after
Numeric and features c-code generated from python template scripts, the
capacity to operate directly on arrays in files, arrays of heterogeneous
records, string arrays, and in-place operation on memory mapped files.

I. ENHANCEMENTS

1. Migration of NumArray.__del__ to C (tp_dealloc).  Overall
performance.

2. Removal of dictionary update from array view creation improves
performance of view/slice/subarray creation.  This should e.g. improve
the performance of wxPython sequence protocol access to Nx2 arrays.
Subclasses now need to do a.flags |= numarray.generic._UPDATEDICT to
ensure that dictionary based attributes are inherited by views. 
NumArrays no longer do this by default.

2. Modifications to support scipy.special.

3. Removal of an unnecessary getattr() from ufunc calling sequence.
Ufunc performance.

II. BUGS FIXED / CLOSED

1179355 average() broken in numarray 1.2.3
1167184 Floating point exception in numarray's dot()
1151892 Bug in matrixmultiply with zero size arrays
1160184 RecArray reversal
1156172 Incorect error message for shape incompatability
1155538 Incorrect error message when multiplying arrays

See
http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse
for more details.

III. CAUTIONS

This release should be backward binary compatible with numarray 1.1.1
and 1.2.3.

WHERE
-----------

Numarray-1.3.0 windows executable installers, source code, and manual is
here:

http://sourceforge.net/project/showfiles.php?group_id=1369

Numarray is hosted by Source Forge in the same project which hosts
Numeric:

http://sourceforge.net/projects/numpy/

The web page for Numarray information is at:

http://stsdas.stsci.edu/numarray/index.html

Trackers for Numarray Bugs, Feature Requests, Support, and Patches are
at the Source Forge project for NumPy at:

http://sourceforge.net/tracker/?group_id=1369

REQUIREMENTS
------------------------------

numarray-1.3.0 requires Python 2.2.2 or greater.  Python-2.3.4 or
Python-2.4.1 is recommended.


AUTHORS, LICENSE
------------------------------

Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC
Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science
Institute.  We'd like to acknowledge the assitance of Francesc Alted,
Paul Dubois, Sebastian Haase, Chuck Harris, Tim Hochberg, Nadav
Horesh, Edward C. Jones, Eric Jones, Jochen Kuepper, Travis Oliphant,
Pearu Peterson, Peter Verveer, Colin Williams, Rory Yorke, and
everyone else who has contributed with comments and feedback.

Numarray is made available under a BSD-style License.  See
LICENSE.txt in the source distribution for details.

--
Todd Miller             jmiller at stsci.edu


From jdhunter at ace.bsd.uchicago.edu  Thu Apr 14 14:14:13 2005
From: jdhunter at ace.bsd.uchicago.edu (John Hunter)
Date: Thu Apr 14 14:14:13 2005
Subject: [Numpy-discussion] ANN: matplotlib-0.80
Message-ID: <m33bttawt8.fsf@peds-pc311.bsd.uchicago.edu>

A lot of development has gone into matplotlib since the last major
release, which I'll summarize here.  For details, see the notes for
the incremental releases at http://matplotlib.sf.net/whats_new.html. 

Improvements since 0.70

 -- contouring: 

    Lots of new contour funcitonality with line and polygon contours
    provided by contour and contourf.  Automatic inline contour
    labeling with clabel. See
    http://matplotlib.sourceforge.net/screenshots.html#pcolor_demo

 -- QT backend
    Sigve Tjoraand, Ted Drain and colleagues at the JPL collaborated
    on a QTAgg backend

 -- Unicode strings are rendered in the agg and postscript backends.
    Currently, all the symbols in the unicode string have to be in the
    active font file.  In later releases we'll try and support symbols
    from multiple ttf files in one string.  See
    examples/unicode_demo.py

 -- map and projections

    A new release of the basemap toolkit -  See
    http://matplotlib.sourceforge.net/screenshots.html#plotmap

 -- Auto-legends

    The automatic placement of legends is now supported with
    loc='best'; see examples/legend_auto.py.  We did this at the
    matplotlib sprint at pycon -- Thanks John Gill and Phil! Note that
    your legend will move if you interact with your data and you force
    data under the legend line.  If this is not what you want, use a
    designated location code.

 -- Quiver (direction fields)

    Ludovic Aubry contributed a patch for the matlab compatible quiver
    method.  This makes a direction field with arrows.  See
    examples/quiver_demo.py

 -- Performance optimizations

    Substantial optimizations in line marker drawing in agg

 -- Robust log plots

    Lots of work making log plots "just work".  You can toggle log y
    Axes with the 'l' command -- nonpositive data are simply ignored
    and no longer raise exceptions.  log plots should be a lot faster
    and more robust


 -- Many more plotting functions, bugfixes, and features, detailed in
    the 0.71, 0.72, 0.73 and 0.74 point release notes at
    http://matplotlib.sourceforge.net/whats_new.html


http://matplotlib.sourceforge.net

JDH


From simon at arrowtheory.com  Thu Apr 14 23:07:03 2005
From: simon at arrowtheory.com (Simon Burton)
Date: Thu Apr 14 23:07:03 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
Message-ID: <20050415160425.42cb20a6.simon@arrowtheory.com>

Hi,

I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
Is this in the pipeline, or do we go ahead and add the dpotrs based functionality ourselves ? Alternatively, are we able to
convert to and from Numeric (scipy) array's without a memcopy ?

thankyou,

Simon.

-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com 


From arnd.baecker at web.de  Thu Apr 14 23:58:08 2005
From: arnd.baecker at web.de (Arnd Baecker)
Date: Thu Apr 14 23:58:08 2005
Subject: [Numpy-discussion] % and fmod
Message-ID: <Pine.LNX.4.51.0504150845570.3336@ptpcp8.phy.tu-dresden.de>

Dear all,

I encountered the following puzzling behaviour of the modulo operator %:

In [1]: import Numeric
In [2]: print Numeric.__version__
23.8
In [3]: x=Numeric.arange(10.0)
In [4]: print x%4
[ 0.  1.  2.  3.  0.  1.  2.  3.  0.  1.]
In [5]: print 3.0%4
3.0
In [6]: print (-x)%4
[-0. -1. -2. -3. -0. -1. -2. -3. -0. -1.]     # <======
In [7]: print (-3.0)%4                        #    vs.
1.0                                           # <====== (OK)
In [8]: print Numeric.fmod(x,4)
[ 0.  1.  2.  3.  0.  1.  2.  3.  0.  1.]
In [9]: print Numeric.fmod(-x,4)
[-0. -1. -2. -3. -0. -1. -2. -3. -0. -1.]


So it seems that for arrays % behaves like fmod!
This seems in contrast to what one finds in the
python 2.3 documentation:

"5.6. Binary arithmetic operations"

   """The % (modulo) operator yields the remainder from the division
      of the first argument by the second. [...]
      The arguments may be floating point numbers, e.g.,
      3.14%0.7 equals 0.34 (since 3.14 equals 4*0.7 + 0.34.)
      The modulo operator always yields a result with the same sign as
      its second operand (or zero); the absolute value of the result
      is strictly smaller than the absolute value of the second
      operand."""

I am presently teaching a course on computational physics
with python and the students have huge difficulties
with % behaving differently for arrays and scalars.

I am aware that (according to Kernighan/Ritchie) the C standard
does not define the result of % when any of the operands is
negative.

So can someone help me: is the different behaviour of %
for scalars and arrays a bug, a feature,
or what should I tell my students ? ;-).

Many thanks,

Arnd


P.S.: BTW: the documentation for fmod and remainder is
pretty short on this:

In [3]:fmod?
Type:           ufunc
String Form:    <ufunc 'fmod'>
Namespace:      Interactive
Docstring:
    fmod(x,y) is remainder(x,y)

In [4]:remainder?
Type:           ufunc
String Form:    <ufunc 'remainder'>
Namespace:      Interactive
Docstring:
    returns remainder of division elementwise

Are contributions of more detailed doc-strings welcome ?


P.P.S.: for numarray one gets even less information:

In [1]: import numarray
In [2]: numarray.fmod?
Type:           _BinaryUFunc
Base Class:     <class 'numarray.ufunc._BinaryUFunc'>
String Form:    <UFunc: 'remainder'>
Namespace:      Interactive
Docstring:
    Class for ufuncs with 2 input and 1 output arguments

In [3]: numarray.remainder?
Type:           _BinaryUFunc
Base Class:     <class 'numarray.ufunc._BinaryUFunc'>
String Form:    <UFunc: 'remainder'>
Namespace:      Interactive
Docstring:
    Class for ufuncs with 2 input and 1 output arguments
In [4]: print numarray.__version__
1.1.1


P^3.S: scipy's mod seems to be an alternative:
In [1]: import scipy
In [2]: scipy.mod?
Type:           function
Base Class:     <type 'function'>
String Form:    <function mod at 0x40383994>
Namespace:      Interactive
File:
/usr/lib/python2.3/site-packages/scipy_base/function_base.py
Definition:     scipy.mod(x, y)
Docstring:
    x - y*floor(x/y)

    For numeric arrays, x % y has the same sign as x while
    mod(x,y) has the same sign as y.

In [3]: x=-scipy.arange(10)
In [4]: x%4
Out[4]: array([ 0, -1, -2, -3,  0, -1, -2, -3,  0, -1])
In [5]: scipy.mod(x,4)
Out[5]: array([ 0.,  3.,  2.,  1.,  0.,  3.,  2.,  1.,  0.,  3.])
In [6]: scipy.mod??
Type:           function
Base Class:     <type 'function'>
String Form:    <function mod at 0x40383994>
Namespace:      Interactive
File:
/usr/lib/python2.3/site-packages/scipy_base/function_base.py
Definition:     scipy.mod(x, y)
Source:
def mod(x,y):
    """ x - y*floor(x/y)

        For numeric arrays, x % y has the same sign as x while
        mod(x,y) has the same sign as y.
    """
    return x - y*Numeric.floor(x*1.0/y)


From jmiller at stsci.edu  Fri Apr 15 03:46:37 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Fri Apr 15 03:46:37 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <20050415160425.42cb20a6.simon@arrowtheory.com>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
Message-ID: <1113561843.5030.9.camel@jaytmiller.comcast.net>

On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote:
> Hi,
> 
> I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
> Is this in the pipeline, 

No.  Most of the add-on subpackages in numarray, with the exception of
convolve, image, and nd_image, are ports from Numeric.

> or do we go ahead and add the dpotrs based functionality ourselves ?
>
>  Alternatively, are we able to
> convert to and from Numeric (scipy) array's without a memcopy ?

Unless Numeric has been adapted to support the new array interface,  I
think this (converting from numarray to Numeric) has still not been
properly addressed.

Regards,
Todd


From luszczek at cs.utk.edu  Fri Apr 15 07:11:20 2005
From: luszczek at cs.utk.edu (Piotr Luszczek)
Date: Fri Apr 15 07:11:20 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <20050415160425.42cb20a6.simon@arrowtheory.com>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
Message-ID: <425FCAFC.3010603@cs.utk.edu>

Hi all,

the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
apologize if every body knows that).

I'm on the LAPACK team right now and we were wondering if we should
provide bindings for Python. It is almost trivial to do with Pyrex.
But Numeric and numarray already have some functionality in it.
Also, I don't know about popularity of PyLapack.

So my question is if there is a need for the specialized LAPACK
routines. And if so, which API it should use (Numeric, numarray,
Numeric3, scipy_core, standard array, minimum standard array implementation
or array protocol meta info).

Any comments are appreciated,

Piotr Luszczek

Simon Burton wrote:
> Hi,
> 
> I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
> Is this in the pipeline, or do we go ahead and add the dpotrs based functionality ourselves ? Alternatively, are we able to
> convert to and from Numeric (scipy) array's without a memcopy ?
> 
> thankyou,
> 
> Simon.


From perry at stsci.edu  Fri Apr 15 07:21:23 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Fri Apr 15 07:21:23 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <425FCAFC.3010603@cs.utk.edu>
References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu>
Message-ID: <f3843af394ea9bef2b579d557142b78d@stsci.edu>

On Apr 15, 2005, at 10:09 AM, Piotr Luszczek wrote:

> Hi all,
>
> the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
> apologize if every body knows that).
>
> I'm on the LAPACK team right now and we were wondering if we should
> provide bindings for Python. It is almost trivial to do with Pyrex.
> But Numeric and numarray already have some functionality in it.
> Also, I don't know about popularity of PyLapack.
>
> So my question is if there is a need for the specialized LAPACK
> routines. And if so, which API it should use (Numeric, numarray,
> Numeric3, scipy_core, standard array, minimum standard array 
> implementation
> or array protocol meta info).
>
> Any comments are appreciated,
>
> Piotr Luszczek
>

If you don't need anything unusual, using the Numeric C-API should be 
safe. There is the intent to preserve backward compatibility for that 
in numarray and Numeric3 for the most part (numarray's ufunc api is 
different however, but it isn't clear you need to use that). Numeric3 
and numarray will/do have other capabilities not part of the Numeric 
api, but again, I suspect that for a first version, one can probably 
avoid needing those. I'd also like to hear what Travis thinks about 
this.

Perry Greenfield


From pjssilva at ime.usp.br  Fri Apr 15 08:00:44 2005
From: pjssilva at ime.usp.br (Paulo J. S. Silva)
Date: Fri Apr 15 08:00:44 2005
Subject: [Numpy-discussion] Pycoin - Python interface to COIN/CLP Linear Programming solver
Message-ID: <1113577115.9013.9.camel@localhost.localdomain>

Hello,

I am finally releasing the code I have to interface COIN/CLP linear
programming solver with Python/Numarray.

You can download the code at:

http://www.ime.usp.br/~pjssilva/pycoin/index.html

In the page you can see sample client code.

The interface is very simple, consisting mostly of swing interfaces
files, but it is very useful to me. It also can be used as an example on
how to interface C++ and Python/Numarray using swig. 

I plan to make this interface grow to something much better, with an
interface to full Clp, another to OsiClp (only this one is available
right now) and maybe other COIN optimization libraries like IPOPT.

Please, download, use, test, comment. 

Best,

Paulo
-- 
Paulo Jos? da Silva e Silva 
Professor Assistente do Dep. de Ci?ncia da Computa??o
(Assistant Professor of the Computer Science Dept.)
Universidade de S?o Paulo - Brazil

e-mail: pjssilva at ime.usp.br        Web: http://www.ime.usp.br/~pjssilva

Teoria ? o que n?o entendemos o    (Theory is something we don't)
suficiente para chamar de pr?tica. (understand well enough to call) 
                                   (practice)


From cookedm at physics.mcmaster.ca  Fri Apr 15 10:48:55 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Fri Apr 15 10:48:55 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <425FCAFC.3010603@cs.utk.edu> (Piotr Luszczek's message of
 "Fri, 15 Apr 2005 10:09:00 -0400")
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
	<425FCAFC.3010603@cs.utk.edu>
Message-ID: <qnkd5swvsn1.fsf@arbutus.physics.mcmaster.ca>

Piotr Luszczek <luszczek at cs.utk.edu> writes:

> Hi all,
>
> the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
> apologize if every body knows that).
>
> I'm on the LAPACK team right now and we were wondering if we should
> provide bindings for Python. It is almost trivial to do with Pyrex.
> But Numeric and numarray already have some functionality in it.
> Also, I don't know about popularity of PyLapack.
>
> So my question is if there is a need for the specialized LAPACK
> routines. And if so, which API it should use (Numeric, numarray,
> Numeric3, scipy_core, standard array, minimum standard array implementation
> or array protocol meta info).

You'll probably first want to look at scipy, which already wraps (all?
most?) of LAPACK in its scipy.linalg package (including dpotrs :-)

It uses f2py to make the process much easier.


Since you mention you're on the LAPACK team ...

I've been working on redoing the f2c'd LAPACK wrappers in Numeric,
updating them to the current version...except: what *is* the current
version? The patches on netlib are 2-3 years old, and you have to grab
them separately, file-by-file (can I say how insanely stupid that
is?). Also ... they break: with some test cases (derived from ones
posted to our bug tracker) some routines segfault.

Is it the LAPACK 3e? If that's the case, we can't use it unless there
are C versions (Numeric only requires Python and a C compiler;
throwing a F90 compiler in there is *not* an option -- we don't even
require a F77 compiler).

I ended up using the source from Debian unstable from the lapack3
package, and those work fine.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From haase at msg.ucsf.edu  Fri Apr 15 12:38:51 2005
From: haase at msg.ucsf.edu (Sebastian Haase)
Date: Fri Apr 15 12:38:51 2005
Subject: [Numpy-discussion] Why does nd_image require writable input array ?
Message-ID: <200504151235.48573.haase@msg.ucsf.edu>

Hi,
I'm using memmap to read my MRC-imagedata files. 
I just thought this might be a case of general interest - see below:

>>> s = U.nd.boxcar_filter(Y.vd(1), size=3, output=None, mode="nearest", 
cval=0.0, origin=0, output_type=None)
Traceback (most recent call last):
  File "<input>", line 1, in ?
  File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 314, in 
boxcar_filter
    cval = cval, output_type = output_type)
  File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 261, in 
boxcar_filter1d
    cval, origin, _ni_support._type_to_num[output_type])
TypeError: NA_IoArray: I/O numarray must be writable NumArrays.
>>> na.__version__
'1.2.3'
>>> 

 
Thanks,
Sebastian Haase


From verveer at embl.de  Fri Apr 15 12:55:33 2005
From: verveer at embl.de (Peter Verveer)
Date: Fri Apr 15 12:55:33 2005
Subject: [Numpy-discussion] Why does nd_image require writable input array ?
In-Reply-To: <200504151235.48573.haase@msg.ucsf.edu>
References: <200504151235.48573.haase@msg.ucsf.edu>
Message-ID: <9396f2dea14c14fb7a6bd04f6077c448@embl.de>

You may have run in an older bug which I fixed. Please try upgrading to 
the new numarray 1.3 and see if the problem disappears. If not let me 
know. Note: the function you are using (boxcar_filter) has been renamed 
in 1.3 to uniform_filter (to be more in line with common image 
processing terminology.)

Cheers, Peter

On Apr 15, 2005, at 9:35 PM, Sebastian Haase wrote:

> Hi,
> I'm using memmap to read my MRC-imagedata files.
> I just thought this might be a case of general interest - see below:
>
>>>> s = U.nd.boxcar_filter(Y.vd(1), size=3, output=None, mode="nearest",
> cval=0.0, origin=0, output_type=None)
> Traceback (most recent call last):
>   File "<input>", line 1, in ?
>   File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 314, in
> boxcar_filter
>     cval = cval, output_type = output_type)
>   File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 261, in
> boxcar_filter1d
>     cval, origin, _ni_support._type_to_num[output_type])
> TypeError: NA_IoArray: I/O numarray must be writable NumArrays.
>>>> na.__version__
> '1.2.3'
>>>>
>
>
> Thanks,
> Sebastian Haase


From luszczek at cs.utk.edu  Fri Apr 15 20:41:05 2005
From: luszczek at cs.utk.edu (Piotr Luszczek)
Date: Fri Apr 15 20:41:05 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <qnkd5swvsn1.fsf@arbutus.physics.mcmaster.ca>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>	<425FCAFC.3010603@cs.utk.edu> <qnkd5swvsn1.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <426088F5.90602@cs.utk.edu>

David M. Cooke wrote:
> Piotr Luszczek <luszczek at cs.utk.edu> writes:
> 
> 
>>Hi all,
>>
>>the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
>>apologize if every body knows that).
>>
>>I'm on the LAPACK team right now and we were wondering if we should
>>provide bindings for Python. It is almost trivial to do with Pyrex.
>>But Numeric and numarray already have some functionality in it.
>>Also, I don't know about popularity of PyLapack.
>>
>>So my question is if there is a need for the specialized LAPACK
>>routines. And if so, which API it should use (Numeric, numarray,
>>Numeric3, scipy_core, standard array, minimum standard array implementation
>>or array protocol meta info).
> 
> 
> You'll probably first want to look at scipy, which already wraps (all?
> most?) of LAPACK in its scipy.linalg package (including dpotrs :-)

It seems to have almost all routines.

> It uses f2py to make the process much easier.
> 
> 
> Since you mention you're on the LAPACK team ...
> 
> I've been working on redoing the f2c'd LAPACK wrappers in Numeric,
> updating them to the current version...except: what *is* the current

Current version is 3.0.

> version? The patches on netlib are 2-3 years old, and you have to grab

After funding ran out there were only volunteers left.
It's hard to get free open-source developers these days.

> them separately, file-by-file (can I say how insanely stupid that

Frankly, I had the same comment when I first saw it.
Hopefully, next update will straighten things out.

> is?). Also ... they break: with some test cases (derived from ones
> posted to our bug tracker) some routines segfault.

Yes I know. We have postings about it on the mailing list almost
weekly.

> Is it the LAPACK 3e? If that's the case, we can't use it unless there

LAPACK 3E is only somewhat related to LAPACK.
But it's not "current version".

> are C versions (Numeric only requires Python and a C compiler;
> throwing a F90 compiler in there is *not* an option -- we don't even
> require a F77 compiler).

We've been thinking about languages for a while. CLAPACK user base
is too strong to ignore. So we think of keeping F77 as the base language.
Or maybe we should do f90toC. f2c and f2j are on Netlib already and
f2py has some F90 support.

> I ended up using the source from Debian unstable from the lapack3
> package, and those work fine.

Again, it's hard to get grant money for support.

Thanks for the comments.

Piotr


From pearu at cens.ioc.ee  Fri Apr 15 23:09:01 2005
From: pearu at cens.ioc.ee (pearu at cens.ioc.ee)
Date: Fri Apr 15 23:09:01 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <426088F5.90602@cs.utk.edu>
Message-ID: <Pine.LNX.4.21.0504160848060.22957-100000@cens.kybi>


On Fri, 15 Apr 2005, Piotr Luszczek wrote:

> > You'll probably first want to look at scipy, which already wraps (all?
> > most?) of LAPACK in its scipy.linalg package (including dpotrs :-)
> 
> It seems to have almost all routines.

You should look at scipy.lib.lapack package that has more wrappers than in
scipy.linalg and it will be used in scipy.linalg in future.
scipy.lib.lapack certainly does not wrap all of LAPACK but adding new
wrappers is easy and is done on demand basis. What's wrapped and what's
not in scipy.lib.lapack is well documented in the headers of .pyf.src
files. My current plan is to add CLAPACK sources to scipy.lib.lapack so
that it could be included to Numeric3 project because it has a requirement
that everything should compile having only C compiler available.

> We've been thinking about languages for a while. CLAPACK user base
> is too strong to ignore. So we think of keeping F77 as the base language.
> Or maybe we should do f90toC. f2c and f2j are on Netlib already and
> f2py has some F90 support.

f2py will have limited support for F90 derived types as soon as I get a
chance to review Jeffrey Hagelberg patches on this. However, keeping F77
as the base language is a good idea, imho, free F90 compilers are still
rare these days.

Pearu


From florian.proff.schulze at gmx.net  Sat Apr 16 03:25:37 2005
From: florian.proff.schulze at gmx.net (Florian Schulze)
Date: Sat Apr 16 03:25:37 2005
Subject: [Numpy-discussion] bytes object info
Message-ID: <opspblrrbqttxc4i@news.gmane.org>

Hi!

I just discovered this:
http://members.dsl-only.net/~daniels/Block.html

I didn't try it out, but maybe it's helpful to you.

Regards,
Florian Schulze


From cjw at sympatico.ca  Sat Apr 16 11:29:01 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Sat Apr 16 11:29:01 2005
Subject: [Numpy-discussion] bytes object info
In-Reply-To: <opspblrrbqttxc4i@news.gmane.org>
References: <opspblrrbqttxc4i@news.gmane.org>
Message-ID: <426158FD.8060507@sympatico.ca>

Florian Schulze wrote:

> Hi!
>
> I just discovered this:
> http://members.dsl-only.net/~daniels/Block.html

Ugh!  Letter codes to identify data types - I thought that we had moved 
beyond that. ;-)

Colin W.

>
> I didn't try it out, but maybe it's helpful to you.
>
> Regards,
> Florian Schulze
>
>


From oliphant at ee.byu.edu  Sat Apr 16 21:16:07 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sat Apr 16 21:16:07 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <425FCAFC.3010603@cs.utk.edu>
References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu>
Message-ID: <4261E2A5.1060109@ee.byu.edu>

Piotr Luszczek wrote:

> Hi all,
>
> the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
> apologize if every body knows that).
>
> I'm on the LAPACK team right now and we were wondering if we should
> provide bindings for Python. It is almost trivial to do with Pyrex.
> But Numeric and numarray already have some functionality in it.
> Also, I don't know about popularity of PyLapack.

Scipy already has extensive bindings for LAPACK.    There is even a lot 
of development that has been done for c-compiled bindings.

Right now, scipy_core is being developed to be a single replacement for 
Numeric/numarray.   Lapack bindings are a huge part of that effort.  
But, as I said, the work has been done (using f2py).  The biggest issue 
is supporting f2c'd versions of Lapack so that folks without Fortran 
compilers can still install it.    scipy_core will allow this.  Again, 
most of the effort is accomplished through f2py and scipy_distutils 
which are really good tools. 

Pyrex is nice, but f2py is really, really nice (it even supports 
wrapping basic c-code).

>
> So my question is if there is a need for the specialized LAPACK
> routines. And if so, which API it should use (Numeric, numarray,
> Numeric3, scipy_core, standard array, minimum standard array 
> implementation
> or array protocol meta info).

I think if LAPACK were going to go through the trouble, it would be best 
for LAPACK to provide "array protocol" style wrappers.   That way any 
Python array user could take advantage of them.  

While current scipy users and future scipy_core users do not need 
LAPACK-provided Python wrappers, we would welcome any native support by 
the LAPACK team.   Again, though, I think this should be done through 
the array_protocol API.   A C-API is likely in the near future as well 
(which will provide a little speed up for many small arrays).

-Travis


-Travis


From simon at arrowtheory.com  Sun Apr 17 20:44:16 2005
From: simon at arrowtheory.com (Simon Burton)
Date: Sun Apr 17 20:44:16 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <1113561843.5030.9.camel@jaytmiller.comcast.net>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
	<1113561843.5030.9.camel@jaytmiller.comcast.net>
Message-ID: <20050418134337.1b3f8ae8.simon@arrowtheory.com>

On Fri, 15 Apr 2005 06:44:02 -0400
Todd Miller <jmiller at stsci.edu> wrote:

> On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote:
> > Hi,
> > 
> > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
> > Is this in the pipeline, 
> 
> No.  Most of the add-on subpackages in numarray, with the exception of
> convolve, image, and nd_image, are ports from Numeric.
> 

Ok, thanks Todd; we will have a go at porting this solver then. If you have any more advice on how to get started with this
that would be much appreciated.

Simon.


-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com 


From arnd.baecker at web.de  Mon Apr 18 00:30:10 2005
From: arnd.baecker at web.de (Arnd Baecker)
Date: Mon Apr 18 00:30:10 2005
Subject: [Numpy-discussion] scipy.base - % and fmod  segfault
Message-ID: <Pine.LNX.4.51.0504180919540.20690@ptpcp8.phy.tu-dresden.de>

Hi (in particular Travis),

concerning my recent question on % on fmod for Numeric and numarray
I was curious to see how scipy.base behaves.
With a CVS check-out this morning I get:

In [1]: from scipy.base import *
In [2]: x=arange(10)
In [3]: print x%4
array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1], 'l')
In [4]: print (-x)%4
zsh: 12391 segmentation fault  ipython

(The same holds for fmod, and also for x=arange(10.0) ).

Personally I would prefer if in the end % behaves the
same way for arrays as for scalars.

Do you think that this is possible with scipy.base?

Best,

Arnd

P.S.: I haven't tested much more of scipy.base this time
(but the few things concerning array operations I looked at,
seem fine. Ah there is one: Doing
  import scipy.base
  scipy.base.fmod?
in ipython gives a segmentation fault
(the same with .sin, .exp etc. ...)
)


From jmiller at stsci.edu  Mon Apr 18 06:38:21 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Mon Apr 18 06:38:21 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <20050418134337.1b3f8ae8.simon@arrowtheory.com>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
	 <1113561843.5030.9.camel@jaytmiller.comcast.net>
	 <20050418134337.1b3f8ae8.simon@arrowtheory.com>
Message-ID: <1113831328.29165.30.camel@halloween.stsci.edu>

On Sun, 2005-04-17 at 23:43, Simon Burton wrote:
> On Fri, 15 Apr 2005 06:44:02 -0400
> Todd Miller <jmiller at stsci.edu> wrote:
> 
> > On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote:
> > > Hi,
> > > 
> > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
> > > Is this in the pipeline, 
> > 
> > No.  Most of the add-on subpackages in numarray, with the exception of
> > convolve, image, and nd_image, are ports from Numeric.
> > 
> 
> Ok, thanks Todd; we will have a go at porting this solver then. If you have any more advice on how to get started with this
> that would be much appreciated.

If you're doing a port of something that already works for Numeric
chances are good that numarray's Numeric compatibility API will make
things "just work."  In any case,  be sure to use the compatibility API
since it's the easiest path forward to Numeric3 should that effort prove
successful (which I think it will).

Usually what's involved in porting from Numeric to numarray is just
making sure that the numarray files can be used rather than the Numeric
header files.  I think the style we used for matplotlib,  while not
fully general,  is the simplest and best compromise:

#ifdef NUMARRAY
#include "numarray/arrayobject.h"
#else
#include "Numeric/arrayobject.h"
#endif

In setup.py,  you have to pass extra_compile_args=["-DNUMARRAY=1"] or
similar to the Extension() constructions to build for numarray.  There
are more details we could discuss if you want to build for both Numeric
and numarray simultaneously.

Two limitations of the numarray Numeric compatible C-API are: (1) a
partially compatible array descriptor structure (PyArray_Descr) and (2)
the UFunc C-API.  Generally,  neither of those is an issue,  but for
large projects (e.g. scipy) they matter. 

Good luck porting.  Feel free to ask questions either on the list or
privately if you run into trouble.

Regards,
Todd


From haase at msg.ucsf.edu  Mon Apr 18 09:16:15 2005
From: haase at msg.ucsf.edu (Sebastian Haase)
Date: Mon Apr 18 09:16:15 2005
Subject: [Numpy-discussion] bytes object info
In-Reply-To: <opspblrrbqttxc4i@news.gmane.org>
References: <opspblrrbqttxc4i@news.gmane.org>
Message-ID: <200504180914.33383.haase@msg.ucsf.edu>

Hey,
this _really_ is no SPAM ... ;-)
(Maybe different wording next time)
Thanks,
Sebastian Haase

On Saturday 16 April 2005 03:22, Florian Schulze wrote:
> Hi!
>
> I just discovered this:
> http://members.dsl-only.net/~daniels/Block.html
>
> I didn't try it out, but maybe it's helpful to you.
>
> Regards,
> Florian Schulze
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
 

From oliphant at ee.byu.edu  Mon Apr 18 17:09:49 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr 18 17:09:49 2005
Subject: [Numpy-discussion] Numeric 24.0
Message-ID: <42644B7C.9030907@ee.byu.edu>

I am going to release Numeric 24.0 today or tomorrow unless I hear from 
anybody about some changes that need to get made.

-Travis


From faltet at carabos.com  Tue Apr 19 03:05:27 2005
From: faltet at carabos.com (Francesc Altet)
Date: Tue Apr 19 03:05:27 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42644B7C.9030907@ee.byu.edu>
References: <42644B7C.9030907@ee.byu.edu>
Message-ID: <200504191202.52097.faltet@carabos.com>

Hi,

I was curious about the newly introduced array protocol in Numeric
24.0 (as well as in current numarray CVS), and wanted to check if
there is better speed during Numeric <-> numarray objects conversion.
There answer is "partially" affirmative:

>>> import numarray
>>> import Numeric
>>> print numarray.__version__
1.4.0
>>> print Numeric.__version__
24.0
>>> from time import time
>>> a = numarray.arange(100*1000)
>>> t1=time();b=Numeric.array(a);time()-t1  # numarray --> Numeric
0.0021419525146484375  # It was 1.58109998703 with Numeric 23.8 !

So, numarray --> Numeric speed has been improved quite a lot

On the other way round, Numeric to numarray is not as efficient:

>>> Na = Numeric.arange(100*1000)
>>> t1=time();c=numarray.array(Na);time()-t1 # Numeric --> numarray
0.15217900276184082    # It is much slower than numarray --> Numeric

I guess that the numarray --> Numeric can be speed-up because:

>>> 
t1=time();Nb=numarray.array(buffer(Na),typecode=Na.typecode(),shape=Na.shape);time()-t1
0.00017499923706054688  # Numeric --> numarray using the buffer protocol

So, I guess CVS numarray is still refining the array protocol.

But the thing that mostly shocks me is why the array protocol is still
allowing doing conversions with memory copies because, as you can see
in the last example that uses a buffer protocol, a non-copy memory
conversion is indeed possible for numarray --> Numeric.

So the question is: Would the array protocol bring numarray <->
Numeric <-> Numeric3 conversions without memory copies or this is more
a wish on my half than an actual possibility?

Thanks and keep the nice work!

-- 
>0,0<   Francesc Altet ? ? http://www.carabos.com/
V   V   C?rabos Coop. V. ??Enjoy Data
 "-"


From eric at enthought.com  Tue Apr 19 22:48:17 2005
From: eric at enthought.com (eric jones)
Date: Tue Apr 19 22:48:17 2005
Subject: [Numpy-discussion] job openings at Enthought
Message-ID: <4265ECEF.6050004@enthought.com>

Hey group,

We have a number of scientific/python related jobs open.  If you have 
any interest, please see:

    http://www.enthought.com/careers.htm

thanks,
eric


From cjw at sympatico.ca  Wed Apr 20 00:45:21 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Wed Apr 20 00:45:21 2005
Subject: [Numpy-discussion] Installing Numeric3 using the Borland Compiler
Message-ID: <42660855.4090600@sympatico.ca>

I have tried:

    python setup.py install build_ext --compiler=bcpp

It seems that the distutils call uses scipy.distutils, rather than the 
standard, and that the scipy version is based on an older version of 
distutils.

Is there some way to work around this?

Colin W.


From pearu at cens.ioc.ee  Wed Apr 20 12:00:34 2005
From: pearu at cens.ioc.ee (pearu at cens.ioc.ee)
Date: Wed Apr 20 12:00:34 2005
Subject: [Numpy-discussion] Installing Numeric3 using the Borland Compiler
In-Reply-To: <42660855.4090600@sympatico.ca>
Message-ID: <Pine.LNX.4.21.0504202156100.22701-100000@cens.kybi>


On Wed, 20 Apr 2005, Colin J. Williams wrote:

> I have tried:
> 
>     python setup.py install build_ext --compiler=bcpp
> 
> It seems that the distutils call uses scipy.distutils, rather than the 
> standard, and that the scipy version is based on an older version of 
> distutils.
> 
> Is there some way to work around this?

So, what problems exactly to you experience with the above command? Using
scipy.distutils should not be much different compared to std distutils
when building std extension modules.

Pearu


From oliphant at ee.byu.edu  Wed Apr 20 12:05:30 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr 20 12:05:30 2005
Subject: [Numpy-discussion] Numeric 24.0
Message-ID: <4266A7AD.5090600@ee.byu.edu>

I've released Numeric 24.0 as a beta (2nd version) release.  Right now 
it's just a tar file.

Please find any bugs.  I'll wait a week or two and release a final 
version unless I hear reports of problems.

Thanks to those who have found bugs already.

David Cooke has been especially active in helping fix problems.   Many 
kudos to him.

-Travis


From jmiller at stsci.edu  Thu Apr 21 08:12:30 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr 21 08:12:30 2005
Subject: [Numpy-discussion] ANN: numarray-1.3.1
Message-ID: <1114096238.4446.18.camel@jaytmiller.comcast.net>

Release Notes for numarray-1.3.1

Numarray is an array processing package designed to efficiently
manipulate large multi-dimensional arrays.  Numarray is modelled after
Numeric and features c-code generated from python template scripts,
the capacity to operate directly on arrays in files, arrays of
heterogeneous records, string arrays, and in-place operation on
memory mapped files.

I. ENHANCEMENTS

None.   1.3.1 fixes the problem with gcc-3.4.3

II. BUGS FIXED / CLOSED

1152323 /usr/include/fenv.h:96: error: conflicting types for 'fegete
1185024 numarray-1.2.3 fails to compile with gcc-3.4.3
1187162 Numarray 1.3.0 installation failure

See
http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse
for more details.


From oliphant at ee.byu.edu  Fri Apr 22 03:51:14 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr 22 03:51:14 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <yfsoec7dbmd.fsf@black4.ex.ac.uk>
References: <4266A7AD.5090600@ee.byu.edu> <yfsoec7dbmd.fsf@black4.ex.ac.uk>
Message-ID: <4268D6BD.9000100@ee.byu.edu>

Alexander Schmolck wrote:

>Travis Oliphant <oliphant at ee.byu.edu> writes:
>
>  
>
>>I've released Numeric 24.0 as a beta (2nd version) release.  Right now it's
>>just a tar file.
>>
>>Please find any bugs.  I'll wait a week or two and release a final version
>>unless I hear reports of problems.
>>    
>>
>
>
>I suspect some other problems I haven't tried to track down yet are due to
>this:
>
>    >>> a = num.array([[1],[2],[3]])
>    >>> ~(a==a)
>    array([[-2],
>           [-2],
>           [-2]])
>  
>
What is wrong with this?  ~ is bit-wise not and gives the correct 
answer, here.

>
>Object array comparisons still produce haphazard behaviour:
>
>    >>> a = num.array(["ab", "cd", "efg"], 'O')
>    >>> a == 'ab'
>    0
>  
>
You are mixing Object arrays and character arrays here and expecting too 
much.    String arrays in Numeric and their relationship with object 
arrays have never been too useful.    You need to be explicit  about how 
'ab' is going to be interpreted and do

a == array('ab','O')  to get what you were probably expecting. 

>Finally -- not necessarily a bug, but a change of behaviour that seems undocumented (I'm
>pretty sure this used to give a float array as return value):
>
>    >>> num.zeros((2.0,))
>    *** TypeError: an integer is required
>
>  
>
>'as
>  
>
I don't think this worked as you think it did (I looked at Numeric 21.3). 

num.zeros(2.0)  works  but it shouldn't.  This is a bug that I'll fix.

Shapes should be integers, not floats.  If this was not checked before 
than that was a bug.   It looks like it's always been checked 
differently for single-element tuples and scalars


So, in short,  I see only one small bug here.  Thanks for testing things 
out.

-Travis


From stephen.walton at csun.edu  Mon Apr 25 11:50:28 2005
From: stephen.walton at csun.edu (Stephen Walton)
Date: Mon Apr 25 11:50:28 2005
Subject: [Numpy-discussion] Value selections?
Message-ID: <426D3BA8.6020500@csun.edu>

I'm trying out Numeric 24b2.  In numarray, the following code will plot 
the values of an array which are not equal to 'flag':

f = array!=flag
plot(array[f])

What is the equivalent in Numeric 24b2?


From rkern at ucsd.edu  Mon Apr 25 11:59:03 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Mon Apr 25 11:59:03 2005
Subject: [Numpy-discussion] Value selections?
In-Reply-To: <426D3BA8.6020500@csun.edu>
References: <426D3BA8.6020500@csun.edu>
Message-ID: <426D3D4C.5070302@ucsd.edu>

Stephen Walton wrote:
> I'm trying out Numeric 24b2.  In numarray, the following code will plot 
> the values of an array which are not equal to 'flag':
> 
> f = array!=flag
> plot(array[f])
> 
> What is the equivalent in Numeric 24b2?

compress(f, array) is the lowest common denominator. I'm not sure if 
Numeric 24 gets fancier like numarray.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From confirm-s2-anNSKqzsyA7slXUGUdYHvlkpsPI-numpy-discussion=lists.sourceforge.net at yahoogroups.com  Tue Apr 26 03:10:12 2005
From: confirm-s2-anNSKqzsyA7slXUGUdYHvlkpsPI-numpy-discussion=lists.sourceforge.net at yahoogroups.com (Yahoo! Groups)
Date: Tue Apr 26 03:10:12 2005
Subject: [Numpy-discussion] Please confirm your request to join IErussian
Message-ID: <1114509872.69.19665.m18@yahoogroups.com>


Hello numpy-discussion at lists.sourceforge.net,

We have received your request to join the IErussian 
group hosted by Yahoo! Groups, a free, easy-to-use community service.

This request will expire in 7 days.

TO BECOME A MEMBER OF THE GROUP: 

1) Go to the Yahoo! Groups site by clicking on this link:
   http://groups.yahoo.com/i?i=anNSKqzsyA7slXUGUdYHvlkpsPI&e=numpy-discussion%40lists%2Esourceforge%2Enet 

  (If clicking doesn't work, "Cut" and "Paste" the line above into your 
   Web browser's address bar.)

-OR-

2) REPLY to this email by clicking "Reply" and then "Send"
   in your email program

If you did not request, or do not want, a membership in the
IErussian group, please accept our apologies
and ignore this message.

Regards,

Yahoo! Groups Customer Care

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

 
From jswhit at fastmail.fm  Tue Apr 26 07:58:36 2005
From: jswhit at fastmail.fm (Jeff Whitaker)
Date: Tue Apr 26 07:58:36 2005
Subject: [Numpy-discussion] numarray problems on AIX
Message-ID: <426E5637.1080305@fastmail.fm>

Hi:

I'm having problems with numarray 1.3.1/Python 2.4.1 on AIX 5.2:

Python 2.4.1 (#3, Apr 26 2005, 10:34:56) [C] on aix5
Type "help", "copyright", "credits" or "license" for more information.
 >>> import numarray
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/__init__.py", 
line 42, in ?
    from numarrayall import *
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/numarrayall.py", 
line 2, in ?
    from generic import *
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/generic.py", 
line 1116, in ?
    import numarraycore as _nc
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/numarraycore.py", 
line 1751, in ?
    import ufunc
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/ufunc.py", 
line 13, in ?
    import _converter
ImportError: dynamic module does not define init function (init_converter)

it works with AIX 4 - anyone seen this before?

-Jeff

-- 
Jeffrey S. Whitaker         Phone  : (303)497-6313
Meteorologist               FAX    : (303)497-6449
NOAA/OAR/CDC  R/CDC1        Email  : Jeffrey.S.Whitaker at noaa.gov
325 Broadway                Office : Skaggs Research Cntr 1D-124
Boulder, CO, USA 80303-3328 Web    : http://tinyurl.com/5telg


From faltet at carabos.com  Tue Apr 26 10:45:02 2005
From: faltet at carabos.com (Francesc Altet)
Date: Tue Apr 26 10:45:02 2005
Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms
Message-ID: <200504261942.46011.faltet@carabos.com>

Hi,

I'm having problems converting numarray objects into Numeric in 64-bit
platforms, and I think this is numarray fault, but I'm not completely
sure. 

The problem can be easily visualized in an example (I'm using numarray
1.3.1 and Numeric 24.0b2). In a 32-bit platform (Intel32, Linux):

>>> Num=Numeric.array((3,),typecode='l')
>>> na=numarray.array(Num,typecode=Num.typecode())
>>> Numeric.array(na,typecode=na.typecode())
array([3],'i')    # The conversion has finished correctly

In 64-bit platforms (AMD64, Linux):

>>> Num=Numeric.array((3,),typecode='l')
>>> na=numarray.array(Num,typecode=Num.typecode())
>>> Numeric.array(na,typecode=na.typecode())
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: typecode argument must be a valid type.

The problem is that, for 32-bit platforms, na.typecode() == 'i' as it
should be, but for 64-bit platforms na.typecode() == 'N' that is not a
valid type in Numeric. I guess that na.typecode() should be mapped to
'l' in 64-bit platforms so that Numeric can recognize the Int64
correctly.

Any suggestion?

-- 
>0,0<   Francesc Altet ? ? http://www.carabos.com/
V   V   C?rabos Coop. V. ??Enjoy Data
 "-"


From jmiller at stsci.edu  Tue Apr 26 13:57:14 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Tue Apr 26 13:57:14 2005
Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms
In-Reply-To: <200504261942.46011.faltet@carabos.com>
References: <200504261942.46011.faltet@carabos.com>
Message-ID: <1114548937.24120.97.camel@halloween.stsci.edu>

On Tue, 2005-04-26 at 13:42, Francesc Altet wrote:
> Hi,
> 
> I'm having problems converting numarray objects into Numeric in 64-bit
> platforms, and I think this is numarray fault, but I'm not completely
> sure. 
> 
> The problem can be easily visualized in an example (I'm using numarray
> 1.3.1 and Numeric 24.0b2). In a 32-bit platform (Intel32, Linux):
> 
> >>> Num=Numeric.array((3,),typecode='l')
> >>> na=numarray.array(Num,typecode=Num.typecode())
> >>> Numeric.array(na,typecode=na.typecode())
> array([3],'i')    # The conversion has finished correctly
> 
> In 64-bit platforms (AMD64, Linux):
> 
> >>> Num=Numeric.array((3,),typecode='l')
> >>> na=numarray.array(Num,typecode=Num.typecode())
> >>> Numeric.array(na,typecode=na.typecode())
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: typecode argument must be a valid type.
> 
> The problem is that, for 32-bit platforms, na.typecode() == 'i' as it
> should be, but for 64-bit platforms na.typecode() == 'N' that is not a
> valid type in Numeric. I guess that na.typecode() should be mapped to
> 'l' in 64-bit platforms so that Numeric can recognize the Int64
> correctly.
> 
> Any suggestion?

I agree that since the typecode() method exists for backward
compatibility,  returning 'N' rather than 'l' on an LP64 platform can be
considered a bug.   However,  there are two problems I see:

1. Returning 'l' doesn't handle the case of converting a numarray Int64
array on a 32-bit platform.   AFIK, there is no typecode that will work
for that case.  So,  we're only getting a partial solution.

2. numarray uses typecodes internally to encode type signatures.  There,
platform-independent typecodes are useful and making this change will
add confusion.

I think we may be butting up against the absolute/relative type
definition problem.  Comments?

Todd


From faltet at carabos.com  Wed Apr 27 05:40:35 2005
From: faltet at carabos.com (Francesc Altet)
Date: Wed Apr 27 05:40:35 2005
Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms
In-Reply-To: <1114548937.24120.97.camel@halloween.stsci.edu>
References: <200504261942.46011.faltet@carabos.com> <1114548937.24120.97.camel@halloween.stsci.edu>
Message-ID: <200504271432.46852.faltet@carabos.com>

A Dimarts 26 Abril 2005 22:55, Todd Miller va escriure:
> > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it
> > should be, but for 64-bit platforms na.typecode() == 'N' that is not a
> > valid type in Numeric. I guess that na.typecode() should be mapped to
> > 'l' in 64-bit platforms so that Numeric can recognize the Int64
> > correctly.
>
> I agree that since the typecode() method exists for backward
> compatibility,  returning 'N' rather than 'l' on an LP64 platform can be
> considered a bug.   However,  there are two problems I see:
>
> 1. Returning 'l' doesn't handle the case of converting a numarray Int64
> array on a 32-bit platform.   AFIK, there is no typecode that will work
> for that case.  So,  we're only getting a partial solution.

One can always do a separate case for 64-bit platforms. This solution
is already used in Lib/numerictypes.py

> 2. numarray uses typecodes internally to encode type signatures.  There,
> platform-independent typecodes are useful and making this change will
> add confusion.

Well, this is the root of the problem for 'l' (long int) types, that
their meaning depends on the platform.

Anyway, I've tried with the next patch, and everything seems to work
well (i.e. it's done what it is itended):

--------------------------------------------------------------
--- Lib/numerictypes.py         Wed Apr 27 07:13:08 2005
+++ Lib/numerictypes.py.modif   Wed Apr 27 07:21:48 2005
@@ -389,7 +389,11 @@
 # at code generation / installation time.
 from codegenerator.ufunccode import typecode
 for tname, tcode in typecode.items():
-    typecode[ eval(tname)] = tcode
+    if tname == "Int64" and numinclude.LP64:
+        typecode[ eval(tname)] = 'l'
+    else:
+        typecode[ eval(tname)] = tcode
+

 if numinclude.hasUInt64:
     _MaximumType = {
---------------------------------------------------------------

With that, we have on 64-bit platforms:

>>> import Numeric
>>> Num=Numeric.array((3,),typecode='l')
>>> import numarray
>>> na=numarray.array(Num,typecode=Num.typecode())
>>> Numeric.array(na,typecode=na.typecode())
array([3])
>>> Numeric.array(na,typecode=na.typecode()).typecode()
'l'

and on 32-bit:

>>> Num=Numeric.array((3,),typecode='l')
>>> na=numarray.array(Num,typecode=Num.typecode())
>>> Numeric.array(na,typecode=na.typecode())
array([3],'i')
>>> Numeric.array(na,typecode=na.typecode()).typecode()
'i'

Which should be the correct behaviour.

> I think we may be butting up against the absolute/relative type
> definition problem.  Comments?

That may add some confusion, but if we want to be consistent with the
'l' (long int) meaning for different platforms, I think the suggested
patch (or other more elegant) is the way to go, IMHO.

Cheers,

-- 
>0,0<   Francesc Altet ? ? http://www.carabos.com/
V   V   C?rabos Coop. V. ??Enjoy Data
 "-"


From jmiller at stsci.edu  Wed Apr 27 08:36:09 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Wed Apr 27 08:36:09 2005
Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms
In-Reply-To: <200504271432.46852.faltet@carabos.com>
References: <200504261942.46011.faltet@carabos.com>
	 <1114548937.24120.97.camel@halloween.stsci.edu>
	 <200504271432.46852.faltet@carabos.com>
Message-ID: <1114615773.28309.95.camel@halloween.stsci.edu>

On Wed, 2005-04-27 at 08:32, Francesc Altet wrote:
> A Dimarts 26 Abril 2005 22:55, Todd Miller va escriure:
> > > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it
> > > should be, but for 64-bit platforms na.typecode() == 'N' that is not a
> > > valid type in Numeric. I guess that na.typecode() should be mapped to
> > > 'l' in 64-bit platforms so that Numeric can recognize the Int64
> > > correctly.
> >
> > I agree that since the typecode() method exists for backward
> > compatibility,  returning 'N' rather than 'l' on an LP64 platform can be
> > considered a bug.   However,  there are two problems I see:
> >
> > 1. Returning 'l' doesn't handle the case of converting a numarray Int64
> > array on a 32-bit platform.   AFIK, there is no typecode that will work
> > for that case.  So,  we're only getting a partial solution.
> 
> One can always do a separate case for 64-bit platforms. This solution
> is already used in Lib/numerictypes.py

True.  I'm just pointing out that doing this is still "half broken".  On
the other hand,  it is also "half fixed".


>  if numinclude.hasUInt64:
>      _MaximumType = {
> ---------------------------------------------------------------
> 
> With that, we have on 64-bit platforms:
> 
> >>> import Numeric
> >>> Num=Numeric.array((3,),typecode='l')
> >>> import numarray
> >>> na=numarray.array(Num,typecode=Num.typecode())
> >>> Numeric.array(na,typecode=na.typecode())
> array([3])
> >>> Numeric.array(na,typecode=na.typecode()).typecode()
> 'l'
> 
> and on 32-bit:
> 
> >>> Num=Numeric.array((3,),typecode='l')
> >>> na=numarray.array(Num,typecode=Num.typecode())
> >>> Numeric.array(na,typecode=na.typecode())
> array([3],'i')
> >>> Numeric.array(na,typecode=na.typecode()).typecode()
> 'i'
> 
> Which should be the correct behaviour.

My point was that if you have a numarray Int64 array,  there's nothing
in 32-bit Numeric to convert it to.  Round tripping from
Numeric-to-numarray works,  but not from numarray-to-Numeric.  In this
case,  I think "half-fixed" still has some merit,  I just wanted it to
be clear what we're not doing.

> > I think we may be butting up against the absolute/relative type
> > definition problem.  Comments?
>
> That may add some confusion, but if we want to be consistent with the
> 'l' (long int) meaning for different platforms, I think the suggested
> patch (or other more elegant) is the way to go, IMHO.

I logged this on Source Forge and will get something in for numarray-1.4
so that the typecode() method gives a workable answer on LP64. 
Intersted parties should stick to using the typecode() method rather
than any of numarray's typecode related mappings.

Cheers,
Todd


From simon at arrowtheory.com  Thu Apr 28 17:38:08 2005
From: simon at arrowtheory.com (Simon Burton)
Date: Thu Apr 28 17:38:08 2005
Subject: [Numpy-discussion] numarray dotblas problem on OSX
Message-ID: <20050429103116.092907a7.simon@arrowtheory.com>

Hi,

I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from fink)
who has managed to bomb on this little code example:

>>> import numarray as na
>>> import numarray.random_array as ra
>>> a = ra.random(shape=(257,256))
>>> b = ra.random(shape=(1,256))
>>> na.innerproduct(a, b)

He gets a blas error:

ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine cblas_dgemm was incorrect
Mac OS BLAS parameter error in cblas_dgemm, parameter #0, (unavailable), is 0


Simon.

-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com 


From rkern at ucsd.edu  Thu Apr 28 18:05:30 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Thu Apr 28 18:05:30 2005
Subject: [Numpy-discussion] numarray dotblas problem on OSX
In-Reply-To: <20050429103116.092907a7.simon@arrowtheory.com>
References: <20050429103116.092907a7.simon@arrowtheory.com>
Message-ID: <42718719.1010206@ucsd.edu>

Simon Burton wrote:
> Hi,
> 
> I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from fink)
> who has managed to bomb on this little code example:
> 
> 
>>>>import numarray as na
>>>>import numarray.random_array as ra
>>>>a = ra.random(shape=(257,256))
>>>>b = ra.random(shape=(1,256))
>>>>na.innerproduct(a, b)
> 
> 
> He gets a blas error:
> 
> ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine cblas_dgemm was incorrect
> Mac OS BLAS parameter error in cblas_dgemm, parameter #0, (unavailable), is 0

On OS X 10.3, numarray 1.3.0, self-compiled for the Apple-installed 
Python with vecLib as the BLAS, I don't get an error.

I don't get a result that's sensible to me, either; I get a 
(257,1)-shape array with only the first and last entries non-zero. Your 
colleague might want to reconsider whether he wants innerproduct() or 
dot(), with the appropriate change of shape for b.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From rkern at ucsd.edu  Thu Apr 28 18:09:53 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Thu Apr 28 18:09:53 2005
Subject: [Numpy-discussion] numarray dotblas problem on OSX
In-Reply-To: <42718719.1010206@ucsd.edu>
References: <20050429103116.092907a7.simon@arrowtheory.com> <42718719.1010206@ucsd.edu>
Message-ID: <427188D1.201@ucsd.edu>

Robert Kern wrote:
> Simon Burton wrote:
> 
>> Hi,
>>
>> I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from 
>> fink)
>> who has managed to bomb on this little code example:
>>
>>
>>>>> import numarray as na
>>>>> import numarray.random_array as ra
>>>>> a = ra.random(shape=(257,256))
>>>>> b = ra.random(shape=(1,256))
>>>>> na.innerproduct(a, b)
>>
>>
>>
>> He gets a blas error:
>>
>> ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine 
>> cblas_dgemm was incorrect
>> Mac OS BLAS parameter error in cblas_dgemm, parameter #0, 
>> (unavailable), is 0
> 
> 
> On OS X 10.3, numarray 1.3.0, self-compiled for the Apple-installed 
> Python with vecLib as the BLAS, I don't get an error.
> 
> I don't get a result that's sensible to me, either; I get a 
> (257,1)-shape array with only the first and last entries non-zero.

Oh yes, and apparently a segfault on exit, too.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From edcjones at comcast.net  Fri Apr 29 11:26:05 2005
From: edcjones at comcast.net (Edward C. Jones)
Date: Fri Apr 29 11:26:05 2005
Subject: [Numpy-discussion] numarray: problem with numarray.records
Message-ID: <42727B35.9050401@comcast.net>

#! /usr/bin/env python

import numarray, numarray.strings, numarray.records

doubles = numarray.array([1.0], 'Float64')
strings = numarray.strings.array('abcdefgh', itemsize=8,
                kind=numarray.strings.RawCharArray)
print numarray.records.array(buffer=[strings, strings])
print
print numarray.records.array(buffer=[doubles, doubles])
print
print numarray.records.array(buffer=[strings, doubles])
"""
The output is:

RecArray[
('abcdefgh'),
('abcdefgh')
]

RecArray[
(1.0, 1.0)
]

Traceback (most recent call last):
   File "./mess.py", line 12, in ?
     print numarray.records.array(buffer=[strings, doubles])
   File "/usr/local/lib/python2.4/site-packages/numarray/records.py", 
line 397, in array
     byteorder=byteorder, aligned=aligned)
   File "/usr/local/lib/python2.4/site-packages/numarray/records.py", 
line 106, in fromrecords
     raise ValueError, "inconsistent data at row %d,field %d" % (row, col)
ValueError: inconsistent data at row 1,field 0

The numarray docs (11.2) say:
The first argument, buffer, may be any one of the following:
...
(5) a list of numarrays. There must be one such numarray for each field.

What is going on here?
"""


From edcjones at comcast.net  Fri Apr 29 11:32:07 2005
From: edcjones at comcast.net (Edward C. Jones)
Date: Fri Apr 29 11:32:07 2005
Subject: [Numpy-discussion] numarray: lexicographical sort
Message-ID: <42727D37.8070700@comcast.net>

Suppose arr is a two dimensional numarray. Can the following be done 
entirely within numarray?

alist = arr.tolist()
alist.sort()
arr = numarray.array(alist, arr.type())


From jmiller at stsci.edu  Fri Apr 29 12:42:22 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Fri Apr 29 12:42:22 2005
Subject: [Numpy-discussion] numarray: lexicographical sort
In-Reply-To: <42727D37.8070700@comcast.net>
References: <42727D37.8070700@comcast.net>
Message-ID: <1114803546.21036.30.camel@halloween.stsci.edu>

On Fri, 2005-04-29 at 14:30, Edward C. Jones wrote:
> Suppose arr is a two dimensional numarray. Can the following be done 
> entirely within numarray?
>
> alist = arr.tolist()
> alist.sort()
> arr = numarray.array(alist, arr.type())
> 

I'm pretty sure the answer is no.  The comparisons in numarray's sort()
functions are all single element numerical comparisons.  The list sort()
is using a polymorphic comparison which in this case is the comparison
of two lists.  There's nothing like that in numarray so I don't think
it's possible.

Todd


From verveer at embl-heidelberg.de  Fri Apr  1 00:40:06 2005
From: verveer at embl-heidelberg.de (Peter Verveer)
Date: Fri Apr  1 00:40:06 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core
In-Reply-To: <424C8D05.7030006@ee.byu.edu>
References: <424C8D05.7030006@ee.byu.edu>
Message-ID: <d868d4c76e8280f0161a21f779718794@embl-heidelberg.de>

Good idea, for many applications such an extension would be 'good 
enough'.

1) python code using such arrays should be 100% compatible with 
numarray/numeric/scipy. Should be possible if a sub-set of 
numeric/numarray/scipy is used.

2) Extensions written in C should handle such arrays transparently 
(without unnecessary copying). Should also be possible given a 
compatible data layout.

Peter

> To all interested in the future of arrays...
>
> I'm still very committed to Numeric3 as I want to bring the numarray 
> and Numeric people together behind a single array object for 
> scientific computing.
>
> But,  I've been thinking about the array protocol and thinking that it 
> would be a good thing if this became universal.  One of the ways to 
> make it universal is by having something that follows it in the Python 
> core.
>
>
> So, what if we proposed for the Python core not something like 
> Numeric3 (which would still exist in scipy.base and be everybody's 
> favorite array :-) ),  but a very minimal array object (scaled back 
> even from Numeric) that followed the array protocol and had some C-API 
> associated with it.
>
>
> This minimal array object would support 5 basic types ('bool', 
> 'integer', 'float', 'complex', 'Object').   (Maybe a void type could 
> be defined and a void "scalar" introduced (which would be the bytes 
> object)).  These types correspond to scalars already available in 
> Python and so the whole 0-dim array Python scalar arguments could be 
> ignored.
>
> Math could be done without ufuncs initially (people really needing 
> speed would use scipy.base anyway).   But, more people in the Python 
> community would be able to use arrays and get used to them.  And we 
> would have a reference array_protocol object so that extension writers 
> could write to it.
>
>
> I would not try a project like this until after scipy_core is out, but 
> it's an interesting thing to think about.  I mainly wanted feedback on 
> the basic concept.
>
>
> An alternative would be to "add" multidimensionality to the array 
> object already part of Python, fix it's reallocating with an exposed 
> buffer problem, and add the array protocol.


From oliphant at ee.byu.edu  Fri Apr  1 01:30:38 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  1 01:30:38 2005
Subject: [Numpy-discussion] __array_typestr__
Message-ID: <424D14E9.70607@ee.byu.edu>

For the most part, it seems the array protocol is easy to agree on.  The 
one difficulty is typestr.

For what it's worth, here are my opinions on what has been said 
regarding the typestr.

* Endian-ness should be included in the typestr --- it is how the data 
is viewed and an intrinsic part of the type as much as int, or float.

* I like the fact that struct character codes are documented, but it is 
hard to remember.  The simpler division into basic types and byte-widths 
that the numarray record module uses is easier to remember. 

* I'm mixed on whether or not support for describing complex data types 
should be used or if their description as a record is good enough.  On 
the one hand we think of complex numbers as additional types, but on the 
other hand, in terms of machine layout they really are just two floats, 
so perhaps it is better to look at them that way in a protocol whose 
purpose is just describing how to interpret a block of memory.   
Especially since complex numbers could conceivably be built on top of 
any of the other types.   In addition, it is conceivable that a rational 
array might be supported by some array object in the future and that 
would most easily be handled by a record array where the names were now 
something like ("numer", "denom") .   The typestr argument should just 
help us specify what is in the memory chunk at each array element (how 
should it be described). 

* I'm wondering about including multiple types in the typestr.  On the 
one hand we could describe complicated structures by packing all the 
information into the  typestr.  On the other hand, it may be better if 
we just use 'V8' to describe an 8-byte memory buffer with  an additional 
attribute that contains both the names and the typestr:

__array_recinfo__ = (('real','f4'),('imag','f4'))

or  for a "rational type"

__array_recinfo__ = (('numer','i4'),('denom','i4'))

so that the detail of the typecode for a "record" type is handled by 
another special method using tuples.    On this level, we could add the 
possibility of specifying a shape for a small array inside (just like 
the record array of numarray does).

-Travis


From faltet at carabos.com  Fri Apr  1 02:01:11 2005
From: faltet at carabos.com (Francesc Altet)
Date: Fri Apr  1 02:01:11 2005
Subject: [Numpy-discussion] Re: Array Metadata
In-Reply-To: <20050401041204.18335.qmail@web50208.mail.yahoo.com>
References: <20050401041204.18335.qmail@web50208.mail.yahoo.com>
Message-ID: <200504011146.44549.faltet@carabos.com>

I'm very much with the opinions of Scott. Just some remarks.

A Divendres 01 Abril 2005 06:12, Scott Gilbert va escriure:
> > __array_names__ (optional comma-separated names for record fields)
>
> I really like this idea.  Although I agree with David M. Cooke that it
> should be a tuple of names.  Unless there is a use case I'm not
> considering, it would be preferrable if the names were restricted to valid
> Python identifiers.

Ok. I was thinking on easing the life of C extension writers, but I
agree that a tuple of names should be relatively easily dealed in C as
well. However, as the __array_typestr__ would be a plain string, then
an __array_names__ being a plain string would be consistent with that.

Also, it would be worth to know how to express a record of different
shaped fields. I mean, how to represent a record like:

[array(Int32,shape=(2,3)), array(Float64,shape=(3,))]

The possibilities are:

__array_shapes__ = ((2,3),(3,))
__array_typestr__ = (i,d)

Other possibility could be an extension of the current struct format:

__array_typestr__ = "(2,3)i(3,)d"

more on that later on.

> The struct module has a portable set of typecodes.  They call it
> "standard", but it's the same thing.  The struct module let's you specify
> either standard or native.  For instance, the typecode for "standard long"
> ("=l") is always 4 bytes while a "native long" ("@l") is likely to be 4 or
> 8 bytes depending on the platform.  The __array_typestr__ codes should
> require the "standard" sizes.  There is a table at the bottom of the
> documentation that goes into detail:
>
>     http://docs.python.org/lib/module-struct.html

I fully agree with Scott here. Struct typecodes are offering a way to
approach the Python standards, and this is a good thing for many
developers that knows nothing of array packages and its different
typecodes. IMO, the set of portable set of typecodes in struct module
should only be abandoned if they cannot fulfil all the requirements of
Numeric3/numarray. But I'm pretty confident that they will eventually
do.

> The only problem with the struct module is that it's missing a few types...
> (long double, PyObject, unicode, bit).

Well, bit is not used either in Numeric/numarray and I think few
people would complain on this (they can always pack bits into bytes).
PyObject and unicode can be reduced to a sequence of bytes and some
other metadata to the array protocol can be added to complement its
meaning (say __array_str_encoding__ = "UTF-8" or similar). 

long double is the only type that should be added to struct typecodes,
but convincing the Python crew to do that should be not difficult, I
guess.

> > I also think that rather than attach < or > to the start of the
> > string it would be easier to have another protocol for endianness.
> > Perhaps something like:
> >
> > __array_endian__  (optional Python integer with the value 1 in it).
> > If it is not 1, then a byteswap must be necessary.
>
> A limitation of this approach is that it can't adequately represent
> struct/record arrays where some fields are big endian and others are little
> endian.

Having a mix of different endianess data values in the same data
record would be a bit ill-minded. In fact, numarray does not support
this: a recarray should be all little or big endian. I think that '<'
and '>' would be more than enough to represent this.

> > Bool               -- "b%d" % sizeof(bool)
> > Signed Integer     -- "i%d" % sizeof(<some int>)
> > Unsigned Integer   -- "u%d" % sizeof(<some uint>)
> > Float              -- "f%d" % sizeof(<some float>)
> > Complex            -- "c%d" % sizeof(<some complex>)
> > Object             -- "O%d" % sizeof(PyObject *)
> >          --- this would only be useful on shared memory
> > String             -- "S%d" % itemsize
> > Unicode            -- "U%d" % itemsize
> > Void               -- "V%d" % itemsize
>
> The above is a nice start at reinventing the struct module typecodes.  If
> you and Perry agree to it, that would be great.  A few additions though:

Again, I think it would be better to not get away from the struct
typecodes. But if you end doing it, well, I would like to propose a
couple of additions to the new protocol:

1.- Support shapes for record specification. I'm listing two
possibilities:

  A) __array_typestr__ = "(2,3)i(3,)d"
  
  This would be an easy extension of the struct string type definition.

  B) __array_typestr__ = ("i4","f8")
     __array_shapes__ = ((2,3),(3,))

  This is more '? la numarray'.
  
2.- Allow nested datatypes. Although numarray does not support this
yet, I think it could be very advantageous to be able to express:

[array(Int32,shape=(5,)),[array(Int16,shape=(2,)),array(Float32,shape=(3,4))]]

i.e., the first field would be an array of ints with 6 elements, while
the second field would be actually another record made of 2 fields:
one array of short ints, and other array of simple precision floats.

I'm not sure how exactly implement this, but, what about:

  A) __array_typestr__ = "(5,)i[(2,)h(3,4)f]"
  
  B) __array_typestr__ = ("i4",("i2","f8"))
     __array_shapes__ = ((5,),((2,),(3,4))
  
Because I'm suggesting to adhere the struct specification, I prefer
option A), although I guess option B would be easier to use for
developers (even for extension developers).


> > So, what if we proposed for the Python core not something like
> > Numeric3 (which would still exist in scipy.base and be everybody's
> > favorite array :-) ), but a very minimal array object (scaled back
> > even from Numeric) that followed the array protocol and had some
> > C-API associated with it.
> >
> > This minimal array object would support 5 basic types ('bool',
> > 'integer', 'float', 'complex', 'Object').   (Maybe a void type
> > could be defined and a void "scalar" introduced (which would be
> > the bytes object)).  These types correspond to scalars already
> > available in Python and so the whole 0-dim array Python scalar
> > arguments could be ignored.
>
> I really like this idea.  It could easily be implemented in C or Python
> script.  Since half it's purpose is for documentation, the Python script
> implementation might make more sense.

Yeah, I fully agree with this also.

Cheers,

-- 
>qo<   Francesc Altet ? ? http://www.carabos.com/
V ?V   C?rabos Coop. V. ??Enjoy Data
 ""


From faltet at carabos.com  Fri Apr  1 02:17:36 2005
From: faltet at carabos.com (Francesc Altet)
Date: Fri Apr  1 02:17:36 2005
Subject: [Numpy-discussion] __array_typestr__
In-Reply-To: <424D14E9.70607@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu>
Message-ID: <200504011215.52914.faltet@carabos.com>

A Divendres 01 Abril 2005 11:31, Travis Oliphant va escriure:
> * I'm wondering about including multiple types in the typestr.  On the
> one hand we could describe complicated structures by packing all the
> information into the  typestr.  On the other hand, it may be better if
> we just use 'V8' to describe an 8-byte memory buffer with  an additional
> attribute that contains both the names and the typestr:
>
> __array_recinfo__ = (('real','f4'),('imag','f4'))
>
> or  for a "rational type"
>
> __array_recinfo__ = (('numer','i4'),('denom','i4'))
>
> so that the detail of the typecode for a "record" type is handled by
> another special method using tuples.    On this level, we could add the
> possibility of specifying a shape for a small array inside (just like
> the record array of numarray does).

Like:

__array_recinfo__ = (('numer','i4', (3,4)),('denom','i4', (2,))) ?

Also, this can be easily extended to nested types:

__array_recinfo__ = (('a','i4',(3,4)),(('b','i4',(2,)),('c','f4',(10,2)))

Well, this looks pretty good to me. It has nothing to do with struct
format, but is much more usable, of course.

Cheers,

-- 
>qo<   Francesc Altet ? ? http://www.carabos.com/
V ?V   C?rabos Coop. V. ??Enjoy Data
 ""


From cjw at sympatico.ca  Fri Apr  1 04:57:57 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Fri Apr  1 04:57:57 2005
Subject: [Numpy-discussion] Re: Bytes Object and Metadata
In-Reply-To: <qnkd5tg26be.fsf@arbutus.physics.mcmaster.ca>
References: <20050328182929.50411.qmail@web50205.mail.yahoo.com>	<42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> <qnkd5tg26be.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <424D4504.4030606@sympatico.ca>

David M. Cooke wrote:

>Francesc Altet <faltet at carabos.com> writes:
>
>  
>
>>A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure:
>>    
>>
>>>My proposal:
>>>
>>>__array_data__  (optional object that exposes the PyBuffer protocol or a
>>>sequence object, if not present, the object itself is used).
>>>__array_shape__ (required tuple of int/longs that gives the shape of the
>>>array)
>>>__array_strides__ (optional provides how to step through the memory in
>>>bytes (or bits if a bit-array), default is C-contiguous)
>>>__array_typestr__ (optional struct-like string showing the type ---
>>>optional endianness indicater + Numeric3 typechars, default is 'V')
>>>__array_itemsize__ (required if above is 'S', 'U', or 'V')
>>>__array_offset__ (optional offset to start of buffer, defaults to 0)
>>>      
>>>
>>Considering that heterogenous data is to be suported as well, and
>>there is some tradition of assigning names to the different fields, I
>>wonder if it would not be good to add something like:
>>
>>__array_names__ (optional comma-separated names for record fields)
>>    
>>
>
>A sequence (list or tuple) of strings would be preferable. That
>removes all worrying about using commas in the names.
>
>  
>
As I understand it, record arrays can be heterogenous.  If so, wouldn't 
it make sense for this to be a sequence of tuples?

For example:  [('Name', charStringType), ('Age', _nt.Int8), ...]
 Where _nt is defined by something like:
import numarray.numerictypes as _nt

Colin W.


From cjw at sympatico.ca  Fri Apr  1 05:49:53 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Fri Apr  1 05:49:53 2005
Subject: [Numpy-discussion] __array_typestr__
In-Reply-To: <424D14E9.70607@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu>
Message-ID: <424D5136.8060703@sympatico.ca>

Travis Oliphant wrote:

>
> For the most part, it seems the array protocol is easy to agree on.  
> The one difficulty is typestr.
>
> For what it's worth, here are my opinions on what has been said 
> regarding the typestr.
>
> * Endian-ness should be included in the typestr --- it is how the data 
> is viewed and an intrinsic part of the type as much as int, or float.

In most cases, endian-ness is associated with the machine being used, 
rather than the data element.  It seems to me that numarray's numeric 
types provides a good model, which may need enhancing for records, 
strings etc.

numarray has:

      Numeric type objects:
        Bool
        Int8 Int16 Int32 Int64
        UInt8 UInt16 UInt32 UInt64
        Float32 Double64
        Complex32 Complex64

      Numeric type classes:
        NumericType
          BooleanType
          SignedType
          UnsignedType
          IntegralType
            SignedIntegralType
            UnsignedIntegralType
          FloatingType
          ComplexType


>
> * I like the fact that struct character codes are documented, but it 
> is hard to remember.  

This is the problem.  numerictypes provides nmenonic names and, if one 
uses an editor with autocompletion, a prompt from the editor.  For those 
interface to existing code, there could be a helper function:

              def toType(eltType= 'i'):  => an instance of NumericType

It should also be possible to derive the typeCode from the eltType, 
numarray doesn't seem to provide this.

Colin W.


From cjw at sympatico.ca  Fri Apr  1 06:07:38 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Fri Apr  1 06:07:38 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <424C8D05.7030006@ee.byu.edu>
References: <424C8D05.7030006@ee.byu.edu>
Message-ID: <424D5557.5010806@sympatico.ca>

Travis Oliphant wrote:

>
> To all interested in the future of arrays...
>
> I'm still very committed to Numeric3 as I want to bring the numarray 
> and Numeric people together behind a single array object for 
> scientific computing.
>
Good.

> But,  I've been thinking about the array protocol and thinking that it 
> would be a good thing if this became universal.  One of the ways to 
> make it universal is by having something that follows it in the Python 
> core.
>
>
> So, what if we proposed for the Python core not something like 
> Numeric3 (which would still exist in scipy.base and be everybody's 
> favorite array :-) ),  but a very minimal array object (scaled back 
> even from Numeric) that followed the array protocol and had some C-API 
> associated with it.
>
I thought that your original Numeric3 proposal was in this direction - a 
simple multidimensional array class/type which could
eventually replace Python's array module.  In addition, and separately, 
there were to be a collection of ufuncs.

Later, discussion seemed to drift from the basic Numeric3 towards SciPy.

>
> This minimal array object would support 5 basic types ('bool', 
> 'integer', 'float', 'complex', 'Object').   (Maybe a void type could 
> be defined and a void "scalar" introduced (which would be the bytes 
> object)).  These types correspond to scalars already available in 
> Python and so the whole 0-dim array Python scalar arguments could be 
> ignored.  

Could this be subclassed so that provision could be made for Int8 (or 
even Int1)?

How would an array of records be handled?

>
> Math could be done without ufuncs initially (people really needing 
> speed would use scipy.base anyway).   But, more people in the Python 
> community would be able to use arrays and get used to them.  And we 
> would have a reference array_protocol object so that extension writers 
> could write to it.

It would be good if the user could write his/her ufunc in Python.

>
>
> I would not try a project like this until after scipy_core is out, but 
> it's an interesting thing to think about.  I mainly wanted feedback on 
> the basic concept.
>
The concept looks good.  Regarding timing, it seems better to build the 
foundation before building the house.

Colin W.

>
> An alternative would be to "add" multidimensionality to the array 
> object already part of Python, fix it's reallocating with an exposed 
> buffer problem, and add the array protocol.
>
>
>
> -Travis


From oliphant at ee.byu.edu  Fri Apr  1 12:10:00 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  1 12:10:00 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <371840ef050401104875650ddd@mail.gmail.com>
References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com>
Message-ID: <424DAA16.10007@ee.byu.edu>

>>I'm still very committed to Numeric3 as I want to bring the numarray and
>>Numeric people together behind a single array object for scientific
>>computing.
>>    
>>
Notice that regardless of what I said about what goes into standard 
Python, something like Numeric3 will always exist for use by scientific 
users.  It may just be a useful add on package like Numeric has always 
been.  There is no way I'm going to abandon use of a more capable Numeric. 

>Right. I believe that, among all libraries related with numeric array,
>eventually only one library in the Python core will survive no matter
>how much advanced functions are available, because of the strong
>compatibility with other packages.
>  
>
I don't think this is true.   Things will survive based on utility.  
What we are trying to do with the Python core is define a standard 
protocol that is flexible enough to handle anybody's concept of an 
advanced array (in particular the advanced array that will be in 
scipy.base). 

>Totally agree. I doubt that Guido will accept a large and complex
>library into the standard Python core. I think Numeric is already too
>complex, and numarray is far more complex to be a standard lib in the
>Python core. Numeric3 must shift its focus from better Numeric to
>scale-downed Numeric.
>  
>
I disagree about "shifting focus."  Personally, I'm not going to work on 
something like that until we have a single array package that fulfills 
the needs of all Numeric and most numarray users.   I'm just pointing 
out that what goes in to the Python core should probably be a scaled 
down object with a souped-up "protocol"  so that the array object in 
scipy.base can be used through the array protocol by any other package 
without worrying about having scipy_core at compile time.

>For example, how many Python users care about masked arrays? How many
>Python users want the advanced type from the Python core? I think the
>advanced array type should in some extension lib, not in core array
>lib. 
>
Perhaps you do see my point of view.   Not all Python users care about 
an advanced array object but nearly all technical (scientific and 
engineering users) will.   We just need interoperability.

>If we make clear our target ? becoming a standard library in the
>Python core, we may have no problem in determining what functions
>should be in the core array lib and what functions should be in
>extension libraries using the core array type.
>  
>
>Today, the array type in the Python core is almost useless.
>If Numeric3 offers just much faster performance on numeric types, many
>Python users will start to use new array type in their applications.
>Once it happens, we can create a bunch of extension libraries for more
>advanced operations on the new array type.
>  
>
The "bunch of extension libraries" is already happening and is already 
in progress.  I think we've overshot the mark for the Python core, 
however.   No need to wait "til something happens"

>With all my heart I hope that Numeric3 gears to this direction before
>  
>
>we get the tragedy to have Numeric4, Numeric5, and so on.
>  
>

I'm coming to see that what is most important for the Python core is 
"protocols".  Then, there can be a "million" different array types that 
can all share each other's memory without hassle and much overhead.  

I'm still personally interested in a better Numeric, however, and so 
won't be abandoning the concept of Numeric3 (notice I now call it 
scipy.base --- not a change of focus just a change of name).    I just 
wanted to encourage some discussion on the array protocol.

-Travis


From oliphant at ee.byu.edu  Fri Apr  1 12:23:19 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  1 12:23:19 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <424D5557.5010806@sympatico.ca>
References: <424C8D05.7030006@ee.byu.edu> <424D5557.5010806@sympatico.ca>
Message-ID: <424DAD00.1050203@ee.byu.edu>

> I thought that your original Numeric3 proposal was in this direction - 
> a simple multidimensional array class/type which could
> eventually replace Python's array module.  In addition, and 
> separately, there were to be a collection of ufuncs.


No, that's a misunderstanding.   Original Numeric3 was never about 
"simplyifying."  Because, we can't "simplify" and still support the uses 
that Numeric and numarray have enjoyed.  I'm more interested in using 
something like Numeric and will always install it should it exist.   I 
was iunterested in getting it into the Python core for standardization.  
I now believe that "universal" standardization should occur around a 
"protocol" and perhaps a simple implementation. 

I'm still interested in a more "local standardization" for numarray and 
Numeric users (not all Python users) which is the focus of scipy.base 
(used to call it Numeric3).

In the process we are generating good ideas that can be used for "global 
standardization" among all Python users.   But,  I can't do it all.  I 
have to keep focused on what I'm doing with the current Numeric 
arrayobject (and that has never been about "getting rid of 
functionality"). 

>
> Later, discussion seemed to drift from the basic Numeric3 towards SciPy.

The context of the problem as I see it intimately involves scipy and the 
collection of packages surrounding numarray.  The small community we 
have built up was diverging in the creation of external packages.  This 
is what troubled me most deeply.  So, there is no Numeric3 separate from 
the larger issue of "a collection of standard scientific packages" that 
scipy has tried to be.  That is why reference to scipy is made.   I see 
no "drifting occurring" 

There is a separate issue of a good array module for Python.  I now see 
the solution there as being more of a "good array protocol" for Python 
with a default very simple implementation that is improved by extension 
modules.

>
> Could this be subclassed so that provision could be made for Int8 (or 
> even Int1)?

I suppose, but this is kind of missing the point, because Numeric3 will 
support those types.  If you need a more advanced array you install 
scipy.base.

>
> How would an array of records be handled?

By installing a more advanced array.

> The concept looks good.  Regarding timing, it seems better to build 
> the foundation before building the house.

The problem with your analogy is that the "sprawling mansion in the 
suburbs is already built" (Numeric has been around for a long time).    
The question is what kind of housing to build for the city dwellers and 
what kind of transportation system do we establish so people can move 
back and forth easily. 

-Travis


From sdhyok at gmail.com  Fri Apr  1 12:59:07 2005
From: sdhyok at gmail.com (Daehyok Shin)
Date: Fri Apr  1 12:59:07 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core
In-Reply-To: <424DAA16.10007@ee.byu.edu>
References: <424C8D05.7030006@ee.byu.edu>
	 <371840ef050401104875650ddd@mail.gmail.com>
	 <424DAA16.10007@ee.byu.edu>
Message-ID: <371840ef05040112574b6a86bd@mail.gmail.com>

On Apr 1, 2005 8:07 PM, Travis Oliphant <oliphant at ee.byu.edu> wrote:

snip

> I disagree about "shifting focus."  Personally, I'm not going to work on
> something like that until we have a single array package that fulfills
> the needs of all Numeric and most numarray users.   I'm just pointing
> out that what goes in to the Python core should probably be a scaled
> down object with a souped-up "protocol"  so that the array object in
> scipy.base can be used through the array protocol by any other package
> without worrying about having scipy_core at compile time.

Would you tell me what exactly you means by "protocol"?
Do you mean a standard defintion of a series of "interfaces" for array
type in Python?

-- 
Daehyok Shin
Geography Department
University of North Carolina-Chapel Hill
USA


From oliphant at ee.byu.edu  Fri Apr  1 15:14:07 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  1 15:14:07 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <371840ef05040112574b6a86bd@mail.gmail.com>
References: <424C8D05.7030006@ee.byu.edu>	 <371840ef050401104875650ddd@mail.gmail.com>	 <424DAA16.10007@ee.byu.edu> <371840ef05040112574b6a86bd@mail.gmail.com>
Message-ID: <424DD56E.6070801@ee.byu.edu>

Daehyok Shin wrote:

>On Apr 1, 2005 8:07 PM, Travis Oliphant <oliphant at ee.byu.edu> wrote:
>
>snip
>
>  
>
>>I disagree about "shifting focus."  Personally, I'm not going to work on
>>something like that until we have a single array package that fulfills
>>the needs of all Numeric and most numarray users.   I'm just pointing
>>out that what goes in to the Python core should probably be a scaled
>>down object with a souped-up "protocol"  so that the array object in
>>scipy.base can be used through the array protocol by any other package
>>without worrying about having scipy_core at compile time.
>>    
>>
>
>Would you tell me what exactly you means by "protocol"?
>Do you mean a standard defintion of a series of "interfaces" for array
>type in Python?
>  
>
Yes, pretty much.   I would even go so far as to say a set of hooks in 
the typeobject (like the sequence, mapping, and buffer protocols). 

-Travis


From steve at shrogers.com  Sat Apr  2 06:50:58 2005
From: steve at shrogers.com (Steven H. Rogers)
Date: Sat Apr  2 06:50:58 2005
Subject: [Numpy-discussion] Thoughts on getting "something" in the Python
 core
In-Reply-To: <424DAA16.10007@ee.byu.edu>
References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> <424DAA16.10007@ee.byu.edu>
Message-ID: <424EB08F.90909@shrogers.com>

First, thanks for doing this Travis.

Travis Oliphant wrote:
> 
> I'm coming to see that what is most important for the Python core is 
> "protocols".  Then, there can be a "million" different array types that 
> can all share each other's memory without hassle and much overhead. 
> I'm still personally interested in a better Numeric, however, and so 
> won't be abandoning the concept of Numeric3 (notice I now call it 
> scipy.base --- not a change of focus just a change of name).    I just 
> wanted to encourage some discussion on the array protocol.
> 

Your array protocol protocol idea sounds good.  It should not only make it 
easier to interoperate with other Python packages, but foreign entities like 
APL/J, Matlab, and LabVIEW.

Regards,
Steve
-- 
Steven H. Rogers, Ph.D., steve at shrogers.com
Weblog: http://shrogers.com/weblog
"Reach low orbit and you're half way to anywhere in the Solar System."
-- Robert A. Heinlein


From oliphant at ee.byu.edu  Sat Apr  2 21:30:03 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sat Apr  2 21:30:03 2005
Subject: [Numpy-discussion] scipy.base (Numeric3) now has math
Message-ID: <424F7F06.4090200@ee.byu.edu>

I've updated scipy.base (Numeric3) so math is now supported (uses the 
old ufunc apparatus with new added types support). 

There is still some work to be done so this is still very alpha (but at 
least math operations work):

   -  update the ufunc apparatus to use buffers to avoid copying an 
entire array just for type casting (and to support unaligned and non 
byteswapped arrays)
   -  update the way error handling is done.
   -  update the coercion strategy like numarray does

   -  fix all the bugs.


I've also fixed things so Numeric extension modules should compile --- 
Please report warnings and bugs with this as well.

Thanks for all your help,

-Travis


From oliphant at ee.byu.edu  Sun Apr  3 01:06:16 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 01:06:16 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <200504011215.52914.faltet@carabos.com>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com>
Message-ID: <424FB19B.4060800@ee.byu.edu>

Hello all,

I've updated the numeric web site and given special prominence to the 
array interface which I believe should be pushed.  Numeric 24.0 will 
support it as will scipy.base (Numeric3).  I hope that numarray will 
also support it in an upcoming release.

Please read through the interface and feel free to comment.  However, 
unless there is a glaring problem, I'm more interested that you feel 
free to start using the interface then that we debate it further.

Scott has expressed interest in implementing a very basic Python-only 
implementation of an object exporting the interface.  I suggest he and 
anyone else interested look at numarray for a starting point for a 
Python implementation, and Numeric for a C implementation. 


-Travis


From mdehoon at ims.u-tokyo.ac.jp  Sun Apr  3 01:24:07 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sun Apr  3 01:24:07 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <424FB19B.4060800@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu>
Message-ID: <424FB72F.4020201@ims.u-tokyo.ac.jp>

There are two questions that I have about the array interface:

1) To what degree will the new array interface look different to users of the 
existing Numerical Python? If I were to install the new array interface on the 
computer of a current Numerical Python user and I didn't tell them, would they 
notice a difference?
2) To what degree is the new array interface compatible with Numerical Python 
for the purpose of C extension modules? Do C extension modules need to be 
modified in order to use the new array interface?

--Michiel.

Travis Oliphant wrote:

> 
> Hello all,
> 
> I've updated the numeric web site and given special prominence to the 
> array interface which I believe should be pushed.  Numeric 24.0 will 
> support it as will scipy.base (Numeric3).  I hope that numarray will 
> also support it in an upcoming release.
> 
> Please read through the interface and feel free to comment.  However, 
> unless there is a glaring problem, I'm more interested that you feel 
> free to start using the interface then that we debate it further.
> 
> Scott has expressed interest in implementing a very basic Python-only 
> implementation of an object exporting the interface.  I suggest he and 
> anyone else interested look at numarray for a starting point for a 
> Python implementation, and Numeric for a C implementation.
> 
> -Travis
> 
> 
> 
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> 
> 

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From oliphant at ee.byu.edu  Sun Apr  3 01:41:09 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 01:41:09 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <424FB72F.4020201@ims.u-tokyo.ac.jp>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp>
Message-ID: <424FB9FA.1090109@ee.byu.edu>

Michiel Jan Laurens de Hoon wrote:

> There are two questions that I have about the array interface:
>
> 1) To what degree will the new array interface look different to users 
> of the existing Numerical Python? If I were to install the new array 
> interface on the computer of a current Numerical Python user and I 
> didn't tell them, would they notice a difference?

Nothing will look different.  For now there is nothing to "install" so 
the array interface is just something to expect from other objects.    
The only thing that would be different is in Numeric 24.0 (if a users 
were to call array(<someobj>) and <someobj> supported the array 
interface then Numeric could return an array (without copying data).  

Older versions of Numeric won't benefit from the interface but won't be 
harmed either.

> 2) To what degree is the new array interface compatible with Numerical 
> Python for the purpose of C extension modules? Do C extension modules 
> need to be modified in order to use the new array interface?

It is completely compatible.  C-extensions don't need to be modified at 
all to make use of the interface (of course they should be re-compiled 
if using Numeric 24.0).   Only two things will be modified in Numeric 
24.0.  1) PyArray_FromObject and friends will be expanded so that if an 
object exposes the array interface the right thing will be done to use 
it's memory.   2) Attributes will be added so that Numeric arrays expose 
the array interface so other objects can use their memory intelligently

-Travis


From cjw at sympatico.ca  Sun Apr  3 05:23:12 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Sun Apr  3 05:23:12 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
Message-ID: <424FE002.6010800@sympatico.ca>

C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install
running install
running build
running config
error: The .NET Framework SDK needs to be installed before building 
extensions for Python.

Is there any chance that a Windows binary could be made available for 
testing?

Colin W.


From mdehoon at ims.u-tokyo.ac.jp  Sun Apr  3 05:35:05 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sun Apr  3 05:35:05 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <424FE002.6010800@sympatico.ca>
References: <424FE002.6010800@sympatico.ca>
Message-ID: <424FE3D8.7040200@ims.u-tokyo.ac.jp>

You can use Cygwin's MinGW compiler by adding --compiler=mingw after the setup 
command.

--Michiel.

Colin J. Williams wrote:

> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install
> running install
> running build
> running config
> error: The .NET Framework SDK needs to be installed before building 
> extensions for Python.
> 
> Is there any chance that a Windows binary could be made available for 
> testing?
> 
> Colin W.
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> 
> 

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From mdehoon at ims.u-tokyo.ac.jp  Sun Apr  3 05:46:04 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sun Apr  3 05:46:04 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <424FE3D8.7040200@ims.u-tokyo.ac.jp>
References: <424FE002.6010800@sympatico.ca> <424FE3D8.7040200@ims.u-tokyo.ac.jp>
Message-ID: <424FE64F.7030706@ims.u-tokyo.ac.jp>

Sorry, that should be --compiler=mingw32.

Michiel Jan Laurens de Hoon wrote:

> You can use Cygwin's MinGW compiler by adding --compiler=mingw after the 
> setup command.
> 
> --Michiel.
> 
> Colin J. Williams wrote:
> 
>> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install
>> running install
>> running build
>> running config
>> error: The .NET Framework SDK needs to be installed before building 
>> extensions for Python.
>>
>> Is there any chance that a Windows binary could be made available for 
>> testing?
>>
>> Colin W.
>>
>>
>> -------------------------------------------------------
>> SF email is sponsored by - The IT Product Guide
>> Read honest & candid reviews on hundreds of IT Products from real users.
>> Discover which products truly live up to the hype. Start reading now.
>> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>>
>>
> 

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From gruben at bigpond.net.au  Sun Apr  3 06:32:09 2005
From: gruben at bigpond.net.au (Gary Ruben)
Date: Sun Apr  3 06:32:09 2005
Subject: [Numpy-discussion] array slicing question
Message-ID: <424FF03A.4060107@bigpond.net.au>

This may be relevant to Numeric 3, but is possibly just a general 
question about array slicing which will either reveal a deficiency in 
specifying slices or in my knowledge of slicing with numpy.
A while ago I was trying to reimplement some Matlab image processing 
code in Numeric and revealed a deficiency in the way slices are defined. 
Suppose I have an n x m array and want to slice off the first and last p 
rows and columns where p can range from 0 to some number. Matlab 
provides a clean way of doing this, but in numpy it's a bit of a mess.

You might think you could do
 >>> p=1
 >>> b = a[p:-p]

but if p=0, this fails.
My final solution involved getting the array shape and explicitly 
calculating start and stop columns, but is there a better way?

Gary R.


From oliphant at ee.byu.edu  Sun Apr  3 08:36:35 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 08:36:35 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu>
Message-ID: <42500D03.3030809@ee.byu.edu>

I don't know if you have followed the array interface discussion.   It 
is defined at http://numeric.scipy.org

I have implemented consumer and exporter interfaces for Numeric and an 
exporter interface for numarray.  The consumer interface needs a little 
help but shouldn't take too long for someone who understands numarray 
better.

Now Numeric arrays can share data with numarray (no data copy).   
scipy.base arrays will also implement the array interface.

I think the array interface is a good direction to go.

-Travis


From konrad.hinsen at laposte.net  Sun Apr  3 13:03:19 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Sun Apr  3 13:03:19 2005
Subject: [Numpy-discussion] array slicing question
In-Reply-To: <424FF03A.4060107@bigpond.net.au>
References: <424FF03A.4060107@bigpond.net.au>
Message-ID: <9d9c98344e25f20ac8509e76f3917ec6@laposte.net>

On 03.04.2005, at 15:31, Gary Ruben wrote:

> You might think you could do
> >>> p=1
> >>> b = a[p:-p]
>
> but if p=0, this fails.

b = a[p:len(a)-p] works even for p=0.

Konrad.
--
------------------------------------------------------------------------ 
-------
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
------------------------------------------------------------------------ 
-------


From oliphant at ee.byu.edu  Sun Apr  3 21:21:15 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 21:21:15 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <20050403165914.GC10730@idi.ntnu.no>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no>
Message-ID: <4250C0A4.9070707@ee.byu.edu>

Magnus Lie Hetland wrote:

>Travis Oliphant <oliphant at ee.byu.edu>:
>  
>
>>I don't know if you have followed the array interface discussion.   It 
>>is defined at http://numeric.scipy.org
>>    
>>
>
>This very, very good! The numeric future of Python is looking very
>bright, IMO :)
>
>Some tiny points:
>
>  - Shouldn't the regexp for __array_typestr__ be
>    '[<>]?[tbiufcOSUV][0-9]+'?
>  
>
Probably.   Since, I guess you can only have one of < or > .  Thanks..

>  - What are the semantics when __array_typestr__ isn't V[0-9]+ and
>    __array_descr__ is set? Is __array_typestr__ ignored? Or... What
>    would it be used for?
>  
>
I would say that the __array_descr__ always gives more information but 
not every array implementation will support looking at it.  For example, 
current Numeric (24.0 in CVS) ignores __array_descr__ and just looks at 
the typestr (and doesn't support 'V').  So,  I suspect that another 
array package that knows this may choose something else besides 'V' if 
it really wants Numeric to still understand it. 

Suppose you have a complex short int array with

__array_descr__ = 'V8

>  - Does the description of __array_data__ mean that the discussed
>    bytes type is no longer needed? (If we can use buffers, that
>    sounds very good to me.)
>  
>
Bytes is still needed because the buffer object is not very good and we 
need a good buffer object in Python for lots of other reasons.  It would 
be very useful, for example to be able to allocate memory using the 
Python bytes object.  But, it does mean less pressure to get it to work.

>  - Why the parentheses around "buffer protocol-satisfying object" in
>    the description of __array_mask__? And why must it be 'b1'? What
>    if I happen to have mask data from a non-array-protocol source,
>    which happens to be, say, b8 (not unreasonable, I think)? Wouldn't
>    it be good to allow any size of these, and just use zero/non-zero
>    as the criterion? Some of the point of this protocol is to avoid
>    copying and using the original data, after all...? (Same goes for
>    the requirement that it be C-contiguous. I guess I'm basically
>    saying that perhaps __array_mask__ should be an array itself. Or,
>    at least, that it could be *allowed* to be...)
>  
>
I added the mask late last night.  It is probably the least thought out 
portion.  Everything else has been through the ringer a couple more 
times.    My whole thinking is that I just didn't want to explode the 
protocol with another special name for the mask type.  But, saying that 
the mask object itself can  support the array interface doesn't do that, 
so I think that is a good call.

Last night, using the numarray exporter interface and the Numeric 
consumer interface I was able to share data between a Numeric array and 
numarray array with no copying of the data buffers.  It was very nice.


-Travis


From oliphant at ee.byu.edu  Sun Apr  3 21:29:12 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sun Apr  3 21:29:12 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <4250C0A4.9070707@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu>
Message-ID: <4250C276.5090300@ee.byu.edu>

>>
> Probably.   Since, I guess you can only have one of < or > .  Thanks..
>
>>  - What are the semantics when __array_typestr__ isn't V[0-9]+ and
>>    __array_descr__ is set? Is __array_typestr__ ignored? Or... What
>>    would it be used for?
>>  
>>
> I would say that the __array_descr__ always gives more information but 
> not every array implementation will support looking at it.  For 
> example, current Numeric (24.0 in CVS) ignores __array_descr__ and 
> just looks at the typestr (and doesn't support 'V').  So,  I suspect 
> that another array package that knows this may choose something else 
> besides 'V' if it really wants Numeric to still understand it.
> Suppose you have a complex short int array with
>
> __array_descr__ = 'V8


Let me finish this example:

Suppose you have a complex short int array with

__array_descr__ = [('real','i2'),('imag','i2')]

you could describe this as

__array_typestr__ = 'V4'  

or think of it as a 4 byte integer if you want to make sure that another 
array package that may not support void pointers can still manipulate 
the data, and so the creator of the complex short int array may decide that

__array_typestr__ = 'i4'

is the right thing to do for packages that ignore the __array_descr__  
attribute.

-Travis


From mdehoon at ims.u-tokyo.ac.jp  Mon Apr  4 01:17:15 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Mon Apr  4 01:17:15 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <424FB9FA.1090109@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu>
Message-ID: <4250F8E5.9020701@ims.u-tokyo.ac.jp>

Travis Oliphant wrote:
>> 1) To what degree will the new array interface look different to users 
>> of the existing Numerical Python?
> 
> Nothing will look different.  For now there is nothing to "install" so 
> the array interface is just something to expect from other objects.    
> The only thing that would be different is in Numeric 24.0 (if a users 
> were to call array(<someobj>) and <someobj> supported the array 
> interface then Numeric could return an array (without copying data). 
> Older versions of Numeric won't benefit from the interface but won't be 
> harmed either.

Very nice. Thanks, Travis.
I'm not sure what you mean by "the array interface could become part of the 
Python standard as early as Python 2.5", since there is nothing to install. Or 
does this mean that Python's array will conform to the array interface?

Some comments on the array interface:

1) The "__array_shape__" method is identical to the existing "shape" method in 
Numerical Python and numarray (except that "shape" does a little bit better 
checking, but it can be added easily to "__array_shape__"). To avoid code 
duplication, it might be better to keep that method. (and rename the other 
methods for consistency, if desired).

2) The __array_datalen__ is introduced to get around the 32-bit int limitation 
of len(). Another option is to fix len() in Python itself, so that it can return 
  integers larger than 32 bits. So we can avoid adding a new method.

3) Where do default values come from? Is it the responsability of the extension 
module writer to find out if the array module implements e.g. __array_strides__, 
and substitute the default values if it doesn't? If so, I have a slight 
preference to make all methods required, since it's not a big effort to return 
the defaults, and there will be more extension modules than array packages (or 
so I hope).

Whereas the array interface certainly helps extension writers to create an 
extension module that works with all array implementations, it also enables and 
perhaps encourages the creation of different array modules, while our original 
goal was to create a single array module that satisfies the needs of both 
Numerical Python and numarray users. I still think such a solution would be 
preferable. Inconsistencies other than the array interface (e.g. one implements 
argmax(x) while another implements x.argmax()) may mean that an extension module 
can work with one array implementation but not with another, even though they 
both conform to the array interface. We may end up with several array packages 
(we already have Numerical Python, numarray, and scipy), and extension modules 
that work with one package and not with another. So in a sense, the array 
interface is letting the genie out of the bottle.

But maybe such a single array package is not attainable given the different 
needs of the different communities.

--Michiel.


-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From magnus at hetland.org  Mon Apr  4 02:05:28 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Mon Apr  4 02:05:28 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <4250C0A4.9070707@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu>
Message-ID: <20050404090356.GB21527@idi.ntnu.no>

Travis Oliphant <oliphant at ee.byu.edu>:
>
[snip]
> Last night, using the numarray exporter interface and the Numeric 
> consumer interface I was able to share data between a Numeric array and 
> numarray array with no copying of the data buffers.  It was very nice.

Wow -- a historic moment :)

Now, if we can only get the stdlib's array module to support this
protocol (and sprout some more dimensions), as you mentioned... That
would really be cool.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Mon Apr  4 02:15:10 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Mon Apr  4 02:15:10 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <4250C276.5090300@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu> <4250C276.5090300@ee.byu.edu>
Message-ID: <20050404091311.GC21527@idi.ntnu.no>

Travis Oliphant <oliphant at ee.byu.edu>:
>
[snip]
> 
> Let me finish this example:
> 
> Suppose you have a complex short int array with
> 
> __array_descr__ = [('real','i2'),('imag','i2')]
> 
> you could describe this as
> 
> __array_typestr__ = 'V4'  

Sure -- I can see how using 'V' makes sense... You're just telling the
host program how many bytes you've got, and that's it. That makes
sense to me. What I wondered about was what happened when you use a
more specific (and conflicting) type for the typestr...

> or think of it as a 4 byte integer if you want to make sure that another 
> array package that may not support void pointers can still manipulate 
> the data, and so the creator of the complex short int array may decide that
> 
> __array_typestr__ = 'i4'

This is basically what I'm wondering about. It would make sense (to
me) to say that the data type was 'V4', because that's simply less
specific, in a way. But saying 'i4' is just as specific as the complex
example, above -- but it means something else! You're basically giving
the program permission to interpret a four-byte complex number as a
four-byte integer, aren't you? Sounds almost like a recipe for
disaster to me :}

On the other hand -- there is no complex integer type in the
interface, and using 'c4' probably would be completely wrong as well.

I would almost be tempted to say that if __array_descr__ is in use,
__array_typestr__ *has* to use the 'V' type. (Or, one could make some
more complicated rules, perhaps, in order to allow other types.)

As for not supporting the 'V' type -- would that really be considered
a conforming implementation? According to the spec, "Objects wishing
to support an N-dimensional array in application code should look for
these attributes and use the information provided appropriately". The
typestr is required, so...

Perhaps the spec should be explicit about the shoulds/musts/mays of
the specific typecodes? What must be supported, what may be supported
etc.? Or perhaps that doesn't make sense? It just seems almost too bad
that one package would have to know what another package supports in
order to formulate its own typestr... It sort of throws part of the
interoperability out the window.

> is the right thing to do for packages that ignore the __array_descr__  
> attribute.
> 
> -Travis

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Mon Apr  4 02:25:17 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Mon Apr  4 02:25:17 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4250F8E5.9020701@ims.u-tokyo.ac.jp>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp>
Message-ID: <20050404092421.GD21527@idi.ntnu.no>

Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp>:
>
[snip]
> 1) The "__array_shape__" method is identical to the existing "shape" method 
> in Numerical Python and numarray (except that "shape" does a little bit 
> better checking, but it can be added easily to "__array_shape__"). To avoid 
> code duplication, it might be better to keep that method. (and rename the 
> other methods for consistency, if desired).

Why not just use 'shape' as an alias for '__array_shape__' (or vice
versa)?

> 2) The __array_datalen__ is introduced to get around the 32-bit int
> limitation of len(). Another option is to fix len() in Python
> itself, so that it can return integers larger than 32 bits. So we
> can avoid adding a new method.

That would bee good, IMO. But how realistic is it? (I have no idea --
this is not a rhetorical question :)

> 3) Where do default values come from? Is it the responsability of the 
> extension module writer to find out if the array module implements e.g. 
> __array_strides__, and substitute the default values if it doesn't?

If the support of these attributes is optional, that would have to be
the case.

> If so, I have a slight preference to make all methods required,
> since it's not a big effort to return the defaults, and there will
> be more extension modules than array packages (or so I hope).

But isn't the point that you should be able to export other things
(such as images or sounds or what-have-you) *as* arrays?

As for implementing the defaults: How about having some utility
functions (or a wrapper object or whatever) that does just this -- so
neither array nor client code need think about it? This could,
perhaps, be put in the stdlib array module or something...

> Whereas the array interface certainly helps extension writers to
> create an extension module that works with all array
> implementations, it also enables and perhaps encourages the creation
> of different array modules, while our original goal was to create a
> single array module that satisfies the needs of both Numerical
> Python and numarray users. I still think such a solution would be
> preferable.

I agree.

But what I think would be cool if such a standardized package could
take any object conforming to this protocol and use it (possibly as
the argument to the array() constructor) -- with all the ufuncs and
operators it has. Because then I could implement specialized arrays
where the specialized behaviour lies just in the data itself, not the
behaviour. For example, I might want to create a thin array wrapper
around a memory-mapped, compressed video file, and treat it as a
three-dimensional array of rgb triples... (And so forth.)

> Inconsistencies other than the array interface (e.g.  one implements
> argmax(x) while another implements x.argmax()) may mean that an
> extension module can work with one array implementation but not with
> another,

This does *not* sound like a good thing -- I agree. Certainly not what
I would hope this protocol is used for.

> even though they both conform to the array interface. We may end up
> with several array packages (we already have Numerical Python,
> numarray, and scipy), and extension modules that work with one
> package and not with another. So in a sense, the array interface is
> letting the genie out of the bottle.

Well, perhaps -- but the current APIs of e.g., Numeric or numarray
could be used in the same way (i.e., writing your own array
implementations with the same interface).

As (I think) Travis has said, there is still a goal (somewhat separate
from the protocol) of getting one standard heavy-duty numerical array
package. I think that would be very beneficial. The point (as I see
it) is just to make it easier for various array implementations (i.e.,
the data, not the ufuncs/operators etc.) to interoperate with it.

> But maybe such a single array package is not attainable given the
> different needs of the different communities.

I would certainly hope it is.

> --Michiel.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From gruben at bigpond.net.au  Mon Apr  4 05:14:09 2005
From: gruben at bigpond.net.au (Gary Ruben)
Date: Mon Apr  4 05:14:09 2005
Subject: [Numpy-discussion] array slicing question
In-Reply-To: <9d9c98344e25f20ac8509e76f3917ec6@laposte.net>
References: <424FF03A.4060107@bigpond.net.au> <9d9c98344e25f20ac8509e76f3917ec6@laposte.net>
Message-ID: <42512F57.2050007@bigpond.net.au>

Thanks Konrad,
Sorry, my example was too simple. The actual example representing an 
image should have been 2-D and not necessarily square. Therefore I used 
shape instead of len and it seemed messy doing it this way.
Gary

konrad.hinsen at laposte.net wrote:
> On 03.04.2005, at 15:31, Gary Ruben wrote:
> 
>> You might think you could do
>> >>> p=1
>> >>> b = a[p:-p]
>>
>> but if p=0, this fails.
> 
> 
> b = a[p:len(a)-p] works even for p=0.
> 
> Konrad.
> -- 
> ------------------------------------------------------------------------ 
> -------
> Konrad Hinsen
> Laboratoire Leon Brillouin, CEA Saclay,
> 91191 Gif-sur-Yvette Cedex, France
> Tel.: +33-1 69 08 79 25
> Fax: +33-1 69 08 82 61
> E-Mail: khinsen at cea.fr
> ------------------------------------------------------------------------ 
> -------
> 
> 


From oliphant at ee.byu.edu  Mon Apr  4 12:16:09 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr  4 12:16:09 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4250F8E5.9020701@ims.u-tokyo.ac.jp>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp>
Message-ID: <4251920B.6060708@ee.byu.edu>

Michiel Jan Laurens de Hoon wrote:

> Travis Oliphant wrote:
>
>>> 1) To what degree will the new array interface look different to 
>>> users of the existing Numerical Python?
>>
>>
>> Nothing will look different.  For now there is nothing to "install" 
>> so the array interface is just something to expect from other 
>> objects.    The only thing that would be different is in Numeric 24.0 
>> (if a users were to call array(<someobj>) and <someobj> supported the 
>> array interface then Numeric could return an array (without copying 
>> data). Older versions of Numeric won't benefit from the interface but 
>> won't be harmed either.
>
>
> Very nice. Thanks, Travis.
> I'm not sure what you mean by "the array interface could become part 
> of the Python standard as early as Python 2.5", since there is nothing 
> to install. Or does this mean that Python's array will conform to the 
> array interface?


The latter is what I mean...  I think it is important to have something 
in Python itself that "conforms to the interface."   I wonder if it 
would also be nice to have some protocol slots in the object type so 
that extension writers can avoid converting some objects.     There is 
also the possibility that a very simple N-d array type could be included 
in Python 2.5 that conforms to the interface, if somebody wants to 
champion that.


I think it is important to realize what the array interface is trying to 
accomplish.  From my perspective, I still think it is better for the 
scientific community to build off of a single array object that is "best 
of breed."  The purpose of the array interface is to allow us scientific 
users to share information with other Python extension writers who may 
be wary to require scipy.base for their users but who really should be 
able to interoperate with scipy.base arrays.    I'm thinking of 
extensions like wxPython, PIL, and so forth.  


There are also lots of uses for arrays that don't necessarily need the 
complexity of the scipy.base array (or uses that need even more 
types).   At some point we may be able to accomodate dynamic type 
additions to the scipy.base array.  But, right now it requires enough 
work that others may want to design their own simple arrays.  It's very 
useful if all such arrays could speak together with a common basic 
language.


The fact that numarray and Numeric arrays can talk to each other more 
seamlessly was not the main goal of the array interface but it is a nice 
side benefit.   I'd still like to see the scientific community use a 
single array.  But, others may not see it that way.  The array interface 
lets us share more easily.


>
> Some comments on the array interface:
>
> 1) The "__array_shape__" method is identical to the existing "shape" 
> method in Numerical Python and numarray (except that "shape" does a 
> little bit better checking, but it can be added easily to 
> "__array_shape__"). To avoid code duplication, it might be better to 
> keep that method. (and rename the other methods for consistency, if 
> desired).


There is no code duplication.  In these cases it is just another name 
for .shape.    What "better
checking" are you referring to?

>
> 2) The __array_datalen__ is introduced to get around the 32-bit int 
> limitation of len(). Another option is to fix len() in Python itself, 
> so that it can return  integers larger than 32 bits. So we can avoid 
> adding a new method.


Python len() will never return a 64-bit number on a 32-bit platform.  

>
> 3) Where do default values come from? Is it the responsability of the 
> extension module writer to find out if the array module implements 
> e.g. __array_strides__, and substitute the default values if it 
> doesn't? If so, I have a slight preference to make all methods 
> required, since it's not a big effort to return the defaults, and 
> there will be more extension modules than array packages (or so I hope).


Optional attributes let modules that care talk to each other on a 
"higher level" without creating noise for simpler extensions.   Both the 
consumer and exporter have to use it to matter.  The defaults are just 
clarifying what is being assumed if it isn't there. 


>
> Whereas the array interface certainly helps extension writers to 
> create an extension module that works with all array implementations, 
> it also enables and perhaps encourages the creation of different array 
> modules, while our original goal was to create a single array module 
> that satisfies the needs of both Numerical Python and numarray users. 
> I still think such a solution would be preferable. 


I agree with you.   I would like a single array module for scientific 
users.  But, satisfying everybody is probably impossible with a single 
array object.    Yes, there could be a proliferation of array objects 
but sometimes we need multiple array objects to learn from each other.   
It's nice to have actual code that implements some idea rather than just 
words in a mailing list. 


The interface  allows us to talk to each other while we learn from each 
other's actual working implementations. 


In a way this is like the old argument between the 1920-era communists 
and the free-marketers.  The communists say that we should have only one 
company that produces some product because having multiple companies is 
"wasteful" of resources,  while the free-marketers point out that 
satisfying consumers is tricky business, and there is not only "one 
right way to do it."  Therefore,  having multiple companies each trying 
to satisfy consumers actually creates wealth as new and better ideas are 
tried by the different companies.  The successful ideas are emulated by 
the rest.   In mature markets there tend to be a reduction in the number 
of producers while in developing markets there are all kinds of 
companies producing basically the same thing. 


Of course software creates it's own issues that aren't addressed by that 
simple analogy, but I think it's been shown repeatedly that good 
interfaces (http, smtp anyone?) create a lot of utility.

> Inconsistencies other than the array interface (e.g. one implements 
> argmax(x) while another implements x.argmax()) may mean that an 
> extension module can work with one array implementation but not with 
> another, even though they both conform to the array interface. We may 
> end up with several array packages (we already have Numerical Python, 
> numarray, and scipy), and extension modules that work with one package 
> and not with another. So in a sense, the array interface is letting 
> the genie out of the bottle.


I think this genie is out of the bottle already.  We need to try and get 
our wishes from it now.

-Travis


From xscottg at yahoo.com  Mon Apr  4 19:09:30 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Mon Apr  4 19:09:30 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: 6667
Message-ID: <20050404233322.61350.qmail@web50208.mail.yahoo.com>

--- Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp> wrote:
>
> I'm not sure what you mean by "the array interface could become
> part of the Python standard as early as Python 2.5", since there
> is nothing to install. Or does this mean that Python's array will
> conform to the array interface?
>

It would be nice to have the Python array module support the protocol for
the 1-Dimensional arrays that it implements.  It would also be nice to add
a *simple* ndarray object in the core that supports multi-dimensional
arrays.  I think breaking backward compatibility of the existing Python
array module to support multiple dimensions would be a mistake and unlikely
to get accepted.

A PEP would likely be required to make the changes to the array module, and
also to add an ndarray module would likely document the interface.  In that
regard, it could "make it into the core" for Python 2.5.

But you're right that external packages could support this interface today.
 There is nothing to install...

> 
> 1) The "__array_shape__" method is identical to the existing "shape"
> method in Numerical Python and numarray (except that "shape" does a
> little bit better checking, but it can be added easily 
> to "__array_shape__"). To avoid code duplication, it might be better
> to keep that method. (and rename the other methods for consistency,
> if desired).
>

The intent is that all array packages would have the required/optional
protocol attributes.  Of course at a higher level, this information will
probably be presented to the users, but they might choose a different
mechanism.

So while A.__array_shape__ always returns a tuple of longs, A.shape is free
to return a ShapeObject or be an assignable attribute that changes the
shape of the object.  With the property mechanism, there is no need to
store duplicated data (__array_shape__ can be a property method that
returns a dynamically generated tuple).

Separating the low level description of the array data in memory from the
high level interface that particular packages like scipy.base or numarray
present to their users is a good thing.


> 
> 3) Where do default values come from? Is it the responsability of the
> extension module writer to find out if the array module implements e.g.
> __array_strides__, and substitute the default values if it doesn't? If
> so, I have a slight preference to make all methods required, since it's
> not a big effort to return the defaults, and there will be more extension
> modules than array packages (or so I hope).
> 

If we can get a *simple* package into the core, in addition to implementing
an ndarray object, this module could have helper functions that do this
sort of thing.  For instance:

    def get_strides(A):
        if hasattr(A, "__array_strides__"):
            return A.__array_strides__
        shape = A.__array_shape__
        size = get_itemsize(A)
        for i in range(len(shape)-1, -1, -1):
            strides.append(size)
            size *= shape[i]
        return tuple(strides)

    def get_itemsize(A):
        typestr = A.__array_typestr__
        # skip the endian
        if typestr[0] in '<>': typestr = typestr[1:]
        # skip the char code
        typestr = typestr[1:]
        return long(typestr)

    def is_contiguous(A):
        # etc....

Those are probably buggy and need work, but you get the idea...  A C
implementation of the above would be easy to do and useful, and it could be
done inline in a single include file (no linking headaches).


Cheers,
    -Scott


From xscottg at yahoo.com  Mon Apr  4 19:09:34 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Mon Apr  4 19:09:34 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: 6667
Message-ID: <20050404233447.26327.qmail@web50204.mail.yahoo.com>

--- Magnus Lie Hetland <magnus at hetland.org> wrote:
> 
> I would almost be tempted to say that if __array_descr__ is in use,
> __array_typestr__ *has* to use the 'V' type. (Or, one could make some
> more complicated rules, perhaps, in order to allow other types.)
> 

Yup, having multiple ways to spell the same information will likely cause
problems.  Wouldn't be bad for the protocol to say "thou shalt use the
specfic typestr when possible".  Or to say that the __array_descr__ is only
for 'V' typestrs.


>
> As for not supporting the 'V' type -- would that really be considered
> a conforming implementation? According to the spec, "Objects wishing
> to support an N-dimensional array in application code should look for
> these attributes and use the information provided appropriately". The
> typestr is required, so...
>

I think the intent is that libraries like wxPython or PIL can recognize
data that they *want* to work with.  They can raise an exception when
passed anything that is more complicated than they're willing to deal with.

I think many packages will simply punt when they see a 'V' typestr and not
look at the more complicated description at all.  Nothing wrong with
that...  The packages that produce more complicated data structures have a
way to express it and pass it to the packages that are capable of consuming
it.  Easy things are easy, and hard things are possible.


> 
> Perhaps the spec should be explicit about the shoulds/musts/mays of
> the specific typecodes? What must be supported, what may be supported
> etc.? Or perhaps that doesn't make sense? It just seems almost too bad
> that one package would have to know what another package supports in
> order to formulate its own typestr... It sort of throws part of the
> interoperability out the window.
>

Being very precise in the language describing the protocol is probably a
good thing, but I don't see anything that requires packages to formulate
their typestr's differently.  The little bit of ambiguity that is in the
__array_typestr__ and __array_descr__ attributes can be easily clarified.


Cheers,
    -Scott


From xscottg at yahoo.com  Mon Apr  4 19:09:38 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Mon Apr  4 19:09:38 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050404092421.GD21527@idi.ntnu.no>
Message-ID: <20050404233620.70070.qmail@web50209.mail.yahoo.com>

--- Magnus Lie Hetland <magnus at hetland.org> wrote:
>
> Why not just use 'shape' as an alias for '__array_shape__' (or vice
> versa)?
> 

The protocol just describes the layout and format of the data in memory. 
As such, most users won't use it directly just as most users don't call
obj.__add__ directly...

If an array implementation has a .shape attribute, it can be whatever the
implementor wants.  Perhaps it's assignable.  Maybe it's a method that
returns a ShapeObject with methods and attributes of it's own.  Features
like these are the things that make the high level array packages like
Numeric and Numarray enjoyable to use.  The low level __array_*metadata__
interface should be simple and precisely defined and just for data
interchange.


> 
> > 3) Where do default values come from? Is it the responsability of the 
> > extension module writer to find out if the array module implements e.g.
> > __array_strides__, and substitute the default values if it doesn't?
> 
> If the support of these attributes is optional, that would have to be
> the case.
> 

> 
> As for implementing the defaults: How about having some utility
> functions (or a wrapper object or whatever) that does just this -- so
> neither array nor client code need think about it? This could,
> perhaps, be put in the stdlib array module or something...
> 

There will be a simple Python module or C include file for such things. 
Hopefully it will eventually be included in the Python standard
distribution, but even if that doesn't happen, it will be easier than
requiring and linking against the Numeric/Numarray/scipy.base libraries
directly.

> 
> But what I think would be cool if such a standardized package could
> take any object conforming to this protocol and use it (possibly as
> the argument to the array() constructor) -- with all the ufuncs and
> operators it has. Because then I could implement specialized arrays
> where the specialized behaviour lies just in the data itself, not the
> behaviour. For example, I might want to create a thin array wrapper
> around a memory-mapped, compressed video file, and treat it as a
> three-dimensional array of rgb triples... (And so forth.)
> 

If you want the ufuncs, you probably want one of the full featured library
packages like scipy.base or numarray.  It looks like Travis is able to
promote any "array protocol object" to a full blown scipy.base.array
already.


>
> > Inconsistencies other than the array interface (e.g.  one implements
> > argmax(x) while another implements x.argmax()) may mean that an
> > extension module can work with one array implementation but not with
> > another,
> 
> This does *not* sound like a good thing -- I agree. Certainly not what
> I would hope this protocol is used for.
> 

Things like argmax(x) are not part of this protocol.  The high level array
packages and libraries will have all sorts of crazy and useful features.

The protocol only describes the layout and format of the data.  It enables
higher level packages to work seemlessly with all the different array
objects. 

That said, this protocol would allow a version argmax(x) to be written in
such a way as to handle *any* array object.


Cheers,
    -Scott


From mdehoon at ims.u-tokyo.ac.jp  Mon Apr  4 19:13:33 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Mon Apr  4 19:13:33 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050404092421.GD21527@idi.ntnu.no>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no>
Message-ID: <4251F40C.6000402@ims.u-tokyo.ac.jp>

Magnus Lie Hetland wrote:
> Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp>:
>>2) The __array_datalen__ is introduced to get around the 32-bit int
>>limitation of len(). Another option is to fix len() in Python
>>itself, so that it can return integers larger than 32 bits. So we
>>can avoid adding a new method.
> 
> 
> That would bee good, IMO. But how realistic is it? (I have no idea --
> this is not a rhetorical question :)

Actually, why is __array_datalen__ needed at all? Can't it be calculated 
trivially from __array_shape__?

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From mdehoon at ims.u-tokyo.ac.jp  Mon Apr  4 19:56:23 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Mon Apr  4 19:56:23 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4251920B.6060708@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <4251920B.6060708@ee.byu.edu>
Message-ID: <4251F384.7080506@ims.u-tokyo.ac.jp>


Travis Oliphant wrote:
>> Some comments on the array interface:
>>
>> 1) The "__array_shape__" method is identical to the existing "shape" 
>> method in Numerical Python and numarray (except that "shape" does a 
>> little bit better checking, but it can be added easily to 
>> "__array_shape__"). To avoid code duplication, it might be better to 
>> keep that method. (and rename the other methods for consistency, if 
>> desired).
> 
> 
> 
> There is no code duplication.  In these cases it is just another name 
> for .shape.    What "better checking" are you referring to?


The method __array_shape__ is

     if (strcmp(name, "__array_shape__") == 0) {
         PyObject *res;
         int i;
         res = PyTuple_New(self->nd);
         for (i=0; i<self->nd; i++) {
             PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i]));
         }
         return res;
     }

while the method shape is

     if (strcmp(name, "shape") == 0) {
         PyObject *s, *o;
         int i;

         if ((s=PyTuple_New(self->nd)) == NULL) return NULL;

         for(i=self->nd; --i >= 0;) {
             if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL;
             if (PyTuple_SetItem(s,i,o) == -1) return NULL;
         }
         return s;
     }

so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I don't see 
how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be fine.

--Michiel.

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From oliphant at ee.byu.edu  Mon Apr  4 20:37:07 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr  4 20:37:07 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4251F40C.6000402@ims.u-tokyo.ac.jp>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> <4251F40C.6000402@ims.u-tokyo.ac.jp>
Message-ID: <4252078C.3050300@ee.byu.edu>

> Actually, why is __array_datalen__ needed at all? Can't it be 
> calculated trivially from __array_shape__?

Lovely point.    I've taken away the __array_datalen__ from the 
interface description.

-Travis


From cookedm at physics.mcmaster.ca  Mon Apr  4 21:17:19 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Mon Apr  4 21:17:19 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4251F384.7080506@ims.u-tokyo.ac.jp> (Michiel Jan Laurens de
 Hoon's message of "Tue, 05 Apr 2005 11:10:12 +0900")
References: <424D14E9.70607@ee.byu.edu>
	<200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu>
	<424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu>
	<4250F8E5.9020701@ims.u-tokyo.ac.jp> <4251920B.6060708@ee.byu.edu>
	<4251F384.7080506@ims.u-tokyo.ac.jp>
Message-ID: <qnkbr8tdfil.fsf@arbutus.physics.mcmaster.ca>

Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp> writes:

> Travis Oliphant wrote:
>>> Some comments on the array interface:
>>>
>>> 1) The "__array_shape__" method is identical to the existing
>>> "shape" method in Numerical Python and numarray (except that
>>> "shape" does a little bit better checking, but it can be added
>>> easily to "__array_shape__"). To avoid code duplication, it might
>>> be better to keep that method. (and rename the other methods for
>>> consistency, if desired).
>> There is no code duplication.  In these cases it is just another
>> name for .shape.    What "better checking" are you referring to?
>
> The method __array_shape__ is
>
>      if (strcmp(name, "__array_shape__") == 0) {
>          PyObject *res;
>          int i;
>          res = PyTuple_New(self->nd);
>          for (i=0; i<self->nd; i++) {
>              PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i]));
>          }
>          return res;
>      }
>
> while the method shape is
>
>      if (strcmp(name, "shape") == 0) {
>          PyObject *s, *o;
>          int i;
>
>          if ((s=PyTuple_New(self->nd)) == NULL) return NULL;
>
>          for(i=self->nd; --i >= 0;) {
>              if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL;
>              if (PyTuple_SetItem(s,i,o) == -1) return NULL;
>          }
>          return s;
>      }
>
> so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I
> don't see how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be
> fine.

The #1 rule of thumb when using the Python C API: _always_ check your
returned results (this usually means checking for NULL). In this,
PyInt_FromLong _can_ fail (if there's an error creating the int free
list). I've fixed this in CVS.

You're right on PyTuple_SET_ITEM: the space for it is guaranteed to
exist after the PyTuple_New.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From oliphant at ee.byu.edu  Mon Apr  4 22:18:23 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr  4 22:18:23 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <20050403165914.GC10730@idi.ntnu.no>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no>
Message-ID: <42521F76.5080309@ee.byu.edu>

Magnus Lie Hetland wrote:

>  - Does the description of __array_data__ mean that the discussed
>    bytes type is no longer needed? (If we can use buffers, that
>    sounds very good to me.)
>  
>

We can use the buffer object, now and it works as far as it goes.   But, 
there are very important reasons for the creation of a good bytes object.

Probably, THE most important reason for the bytes object is Pickle 
support without always making an intermediate string (and the 
accompanying copy that is involved).   Right now, a string is the only 
way to Pickle array data.  A bytes object would allow a way to Pickle 
without making a copy.

-Travis


From Chris.Barker at noaa.gov  Tue Apr  5 00:32:17 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Tue Apr  5 00:32:17 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <42521F76.5080309@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu>
Message-ID: <42523EC0.5000303@noaa.gov>

Travis Oliphant wrote:
> Right now, a string is the only 
> way to Pickle array data.  A bytes object would allow a way to Pickle 
> without making a copy.

So could the new array protocol allow us to make a Python String from an 
array without copying? That could be pretty handy.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer
                                      		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From magnus at hetland.org  Tue Apr  5 01:49:25 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:49:25 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050404233620.70070.qmail@web50209.mail.yahoo.com>
References: <20050404092421.GD21527@idi.ntnu.no> <20050404233620.70070.qmail@web50209.mail.yahoo.com>
Message-ID: <20050405084839.GD29671@idi.ntnu.no>

Scott Gilbert <xscottg at yahoo.com>:
>
[snip]
> > > Inconsistencies other than the array interface (e.g.  one implements
> > > argmax(x) while another implements x.argmax()) may mean that an
> > > extension module can work with one array implementation but not with
> > > another,
> > 
> > This does *not* sound like a good thing -- I agree. Certainly not what
> > I would hope this protocol is used for.
> > 
> 
> Things like argmax(x) are not part of this protocol.  The high level array
> packages and libraries will have all sorts of crazy and useful features.

Sure -- I realise that. I just mean that I hope there won't be several
scientific array modules that implement similar concepts with
different APIs, just because they can (because of the new array API).

> The protocol only describes the layout and format of the data.  It enables
> higher level packages to work seemlessly with all the different array
> objects. 

Exactly.

> That said, this protocol would allow a version argmax(x) to be
> written in such a way as to handle *any* array object.

... given that you can compare the values in the array, of course.
But, yes. This would be (IMO) the ideal situation. Instead of spawning
several equivalent-but-different scientific array modules (i.e. the
ones implementing such functionality as argmax()) we would have *one*
main, standard such module, whose operations would work with almost
any conceivable array object (e.g. from wxPython or PIL). That seems
like a very, very good situation, IMO.

> Cheers,
>     -Scott

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 01:51:35 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:51:35 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <42521F76.5080309@ee.byu.edu>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu>
Message-ID: <20050405085041.GE29671@idi.ntnu.no>

Travis Oliphant <oliphant at ee.byu.edu>:
>
> Magnus Lie Hetland wrote:
> 
> > - Does the description of __array_data__ mean that the discussed
> >   bytes type is no longer needed? (If we can use buffers, that
> >   sounds very good to me.)
> > 
> >
> 
> We can use the buffer object, now and it works as far as it goes.   But, 
> there are very important reasons for the creation of a good bytes object.
> 
> Probably, THE most important reason for the bytes object is Pickle 
> support without always making an intermediate string (and the 
> accompanying copy that is involved).   Right now, a string is the only 
> way to Pickle array data.  A bytes object would allow a way to Pickle 
> without making a copy.

Ah. Very good argument, of course.

But, as I understand it, the protocol as it stands could work with
buffers until we get bytes objects?

> -Travis

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 01:52:09 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:52:09 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <42523EC0.5000303@noaa.gov>
References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <cfc20c604541b9bc049ba7efaf401af3@stsci.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu> <42523EC0.5000303@noaa.gov>
Message-ID: <20050405085108.GF29671@idi.ntnu.no>

Chris Barker <Chris.Barker at noaa.gov>:
>
> Travis Oliphant wrote:
> >Right now, a string is the only 
> >way to Pickle array data.  A bytes object would allow a way to Pickle 
> >without making a copy.
> 
> So could the new array protocol allow us to make a Python String from an 
> array without copying? That could be pretty handy.

Or treat a string as an array... Yay! :)

> -Chris

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 01:52:25 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:52:25 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <4252078C.3050300@ee.byu.edu>
References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> <4251F40C.6000402@ims.u-tokyo.ac.jp> <4252078C.3050300@ee.byu.edu>
Message-ID: <20050405085138.GG29671@idi.ntnu.no>

Travis Oliphant <oliphant at ee.byu.edu>:
>
> 
> >Actually, why is __array_datalen__ needed at all? Can't it be 
> >calculated trivially from __array_shape__?
> 
> Lovely point.    I've taken away the __array_datalen__ from the 
> interface description.

This is only getting prettier and prettier :)

> -Travis

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 01:57:12 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 01:57:12 2005
Subject: [Numpy-discussion] Array interface
In-Reply-To: <20050404233447.26327.qmail@web50204.mail.yahoo.com>
References: <20050404233447.26327.qmail@web50204.mail.yahoo.com>
Message-ID: <20050405085642.GH29671@idi.ntnu.no>

Scott Gilbert <xscottg at yahoo.com>:
>
[snip]
> I think the intent is that libraries like wxPython or PIL can
> recognize data that they *want* to work with.  They can raise an
> exception when passed anything that is more complicated than they're
> willing to deal with.

Sure. I'm just saying that it would be good to have a baseline -- a
basic, mandatory level of conformance, so that if I expose an array
using only that part of the API (or, with the rest being optional
information) I know that any conforming array consumer will understand
me.

As long as we have this, I have to know the capabilities of my
consumer before I can write an appropriate typestr, for example. E.g.,
one application may only accept b1, while another would only accept i1
etc. Who knows -- there may well be sets of consumer applications that
have mutually exclusive sets of accepted typestrings unless a minimum
is mandated.

That's really what I was after here. In addition to saying that
typestr *must* be supported, one might say something about what
typestrs must be supported.

On the other hand -- perhaps such requirements should only be made on
the array side? What requirements can/should one really make on the
consumer side? I mean -- even though we have a strict sequence
protocol, there is nothing wrong with creating something sequence-like
(e.g., supporting floats as indices) and having consumer functions
that aren't as strict as the official protocol...

I just think it's something that it might be worth being explicit
about.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From magnus at hetland.org  Tue Apr  5 02:00:24 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Tue Apr  5 02:00:24 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050404233322.61350.qmail@web50208.mail.yahoo.com>
References: <20050404233322.61350.qmail@web50208.mail.yahoo.com>
Message-ID: <20050405085905.GI29671@idi.ntnu.no>

Scott Gilbert <xscottg at yahoo.com>:
>
> 
> --- Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp> wrote:
> >
> > I'm not sure what you mean by "the array interface could become
> > part of the Python standard as early as Python 2.5", since there
> > is nothing to install. Or does this mean that Python's array will
> > conform to the array interface?
> >
> 
> It would be nice to have the Python array module support the protocol for
> the 1-Dimensional arrays that it implements.  It would also be nice to add
> a *simple* ndarray object in the core that supports multi-dimensional
> arrays.  I think breaking backward compatibility of the existing Python
> array module to support multiple dimensions would be a mistake and unlikely
> to get accepted.

Do we really have to break backward compatibility in order to add more
dimensions to the array module?

There may be some issues with, e.g., typecode, but still...

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From a.schmolck at gmx.net  Tue Apr  5 05:28:13 2005
From: a.schmolck at gmx.net (Alexander Schmolck)
Date: Tue Apr  5 05:28:13 2005
Subject: [Numpy-discussion] array slicing question
In-Reply-To: <424FF03A.4060107@bigpond.net.au> (Gary Ruben's message of
 "Sun, 03 Apr 2005 23:31:38 +1000")
References: <424FF03A.4060107@bigpond.net.au>
Message-ID: <yfsr7hpmmw9.fsf@black4.ex.ac.uk>

Gary Ruben <gruben at bigpond.net.au> writes:

> This may be relevant to Numeric 3, but is possibly just a general question
> about array slicing which will either reveal a deficiency in specifying slices
> or in my knowledge of slicing with numpy.
> A while ago I was trying to reimplement some Matlab image processing code in
> Numeric and revealed a deficiency in the way slices are defined. Suppose I
> have an n x m array and want to slice off the first and last p rows and
> columns where p can range from 0 to some number. Matlab provides a clean way
> of doing this, but in numpy it's a bit of a mess.
>
> You might think you could do
>  >>> p=1
>  >>> b = a[p:-p]

b = a[p:-p or None]

'as


From werner.bruhin at free.fr  Tue Apr  5 11:26:36 2005
From: werner.bruhin at free.fr (Werner F. Bruhin)
Date: Tue Apr  5 11:26:36 2005
Subject: [Numpy-discussion] AttributeError: _NumErrorMode instance has no attribute 'dividebyzero'
Message-ID: <4252D77F.10600@free.fr>

If I use "Numeric.Error.setMode(all='Raise')" I get the above 
AttributeError.

I found this on 1.1.1 but just downloaded 
"numarray-1.2.3.win32-py2.4.exe" and I still find the same problem.

I use numarray with wx.lib.plot.py to generate some simple charts.

I would like to catch the exceptions and display an appropriate message 
to the user.  Is the above the right approach or am I going about this 
the wrong way round?

Any hints are appreciated.
Werner


From xscottg at yahoo.com  Tue Apr  5 13:35:37 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Tue Apr  5 13:35:37 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: 6667
Message-ID: <20050405203434.38638.qmail@web50204.mail.yahoo.com>

--- Magnus Lie Hetland <magnus at hetland.org> wrote:
> 
> Do we really have to break backward compatibility in order to add more
> dimensions to the array module?
> 

You're right.  The Python array module could change in a backwards
compatible way.  Possibly using keyword arguments to specify parameters
that have never been there before.

We could probably make sense out of array.insert(), array.append(),
array.extend(), array.pop(), and array.reverse() by giving those an "axis"
keyword.  Even array.remove() could be made to work for more dimensions,
but it probably wouldn't get used often.  Maybe some of these would just
raise an exception for ndims > 1.

Then we'd have to add some additional typecodes for complex and a few
others.

Under the hood, it would basically be a complete reimplementation, but
maybe that is the way to go...  It does keep the number of array modules
down.  I wonder which way would meet less resistance in getting accepted in
the core.  I think creating a new ndarray object would be less risk of
breaking existing applications.

>
> There may be some issues with, e.g., typecode, but still...
>

The .typecode attribute could return the same values it always has.  The
.__array_typestr__ attribute would return the new style values.  That's
confusing, but probably unavoidable.  It would be nice if there was only
one set of typecodes for all of Python, but I think we're stuck with many
(array module typecores, struct module typecodes, array protocol
typecodes). 


Cheers,
    -Scott


From oliphant at ee.byu.edu  Tue Apr  5 14:28:39 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Tue Apr  5 14:28:39 2005
Subject: [Numpy-discussion] Questions about ufuncs now.
Message-ID: <4253028D.4090407@ee.byu.edu>

The arrayobject for scipy.base seems to be working.  Currently the 
Numeric3 CVS tree is using the "old-style" ufuncs modified with new code 
for the newly added types.     It should be quite functionable now for 
the brave at heart.

I'm now working on modifying the ufunc object for scipy.base.

These are the changes I'm working on:

   1) a thread-specific? context that allows "buffer-size" level trapping
   of errors and retrieving of flags set.  Similar to the
   decimal.context specification, but it uses the floating point
   sticky bits to implement.

   2) implementation of buffers so that type-conversions (and
   byteswapping and alignment if necessary) never creates temporaries
   larger than the buffer-size (the buffer-size is user settable).

   3) a reworking of the general N-dimensional loop to use array 
iterators with optimizations
   applied for contiguous arrays.

   4) Alteration of coercion rules so that scalars (i.e. rank-0 arrays) 
do not dictate coercion rules
   Also, change so that certain mixed-type operations are computed in 
larger type for both.

Most of this is pretty straightforward.  But, I do have one addiitonal 
question.  Do the new array scalars count as "non-coercing" scalars 
(i.e. like the Python scalars), or do they cause coercion?

My preference is that  ALL scalars (anything that becomes 0-dimensional 
arrays internally) cause only "kind-casting" (i.e. int to float, float 
to complex, etc.) but not "type-casting"


-Travis


From oliphant at ee.byu.edu  Tue Apr  5 16:02:34 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Tue Apr  5 16:02:34 2005
Subject: [Numpy-discussion] Numeric 24.0
Message-ID: <42531880.3060600@ee.byu.edu>

I'd like to release a Numeric 24.0  to get the array interface out 
there.   There are also some other bug fixes in Numeric 24.0

Here is the list so far from Numeric 23.7

[Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a is 
2-d of Int16
[unreported]  Added array interface
[unreported]  Allow Long Integers to be used in slices
[1123145]     Handle mu==0.0 appropiately in ranlib/ignpoi.
[unreported]  Return error info in ranlib instead of printing it to stderr
[1151892]     dot() would quit python with zero-sized arrays when using
              dotblas. The BLAS routines *gemv and *gemm need LDA >= 1.
[unreported]  Fixed empty for Object arrays

Version 23.8  March 2005
[Cooke]       Fixed more 64-bit issues (patch 117603)
[unreported]  Changed arrayfnsmodule back to PyArray_INT where the code
              typecasts to (int *).  Changed CanCastSafely to check
              if sizeof(long) == sizeof(int)


I'll wait a little bit to allow last minute bug fixes to go in, but I'd 
realy like to see this release get out there.  For users of Numeric 
 >23.7 try
Numeric.empty((10,20),'O')  if you want to see an *interesting* bug that 
is fixed in CVS.

-Travis


From cookedm at physics.mcmaster.ca  Tue Apr  5 16:13:31 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Tue Apr  5 16:13:31 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42531880.3060600@ee.byu.edu> (Travis Oliphant's message of
 "Tue, 05 Apr 2005 17:00:16 -0600")
References: <42531880.3060600@ee.byu.edu>
Message-ID: <qnk7jjgddht.fsf@arbutus.physics.mcmaster.ca>

Travis Oliphant <oliphant at ee.byu.edu> writes:

> I'd like to release a Numeric 24.0  to get the array interface out
> there.   There are also some other bug fixes in Numeric 24.0
>
> Here is the list so far from Numeric 23.7
>
> [Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a
> is 2-d of Int16
> [unreported]  Added array interface
> [unreported]  Allow Long Integers to be used in slices
> [1123145]     Handle mu==0.0 appropiately in ranlib/ignpoi.
> [unreported]  Return error info in ranlib instead of printing it to stderr
> [1151892]     dot() would quit python with zero-sized arrays when using
>               dotblas. The BLAS routines *gemv and *gemm need LDA >= 1.
> [unreported]  Fixed empty for Object arrays
>
> Version 23.8  March 2005
> [Cooke]       Fixed more 64-bit issues (patch 117603)
> [unreported]  Changed arrayfnsmodule back to PyArray_INT where the code
>               typecasts to (int *).  Changed CanCastSafely to check
>               if sizeof(long) == sizeof(int)
>
>
> I'll wait a little bit to allow last minute bug fixes to go in, but
> I'd realy like to see this release get out there.  For users of
> Numeric >23.7 try
> Numeric.empty((10,20),'O')  if you want to see an *interesting* bug
> that is fixed in CVS.

Can you hold on? I've got some bugs I'm working on. There's some
64-bit things I'm working (various places that a long is cast to an
int). For instance,

a = Numeric.array((3,))
a.resize((2**32,))

gives a.shape == (1,) instead of an error. Stuff like this happens in
the new array interface too :-)

I'd suggest, before releasing with a bumped version number to 24.0, we
release a beta version first. Shake out bugs in the array interface,
and potentially allow for some changes if necessary.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From mdehoon at ims.u-tokyo.ac.jp  Tue Apr  5 20:34:03 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Tue Apr  5 20:34:03 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42531880.3060600@ee.byu.edu>
References: <42531880.3060600@ee.byu.edu>
Message-ID: <4253597F.1090501@ims.u-tokyo.ac.jp>

Travis Oliphant wrote:
> I'd like to release a Numeric 24.0  to get the array interface out 
> there.   There are also some other bug fixes in Numeric 24.0

Thanks for the notification, Travis. I have commited patch #732520 (Eigenvalues 
on cygwin bug fix), which fixes bug #706716 (eigenvalues is broken). It's great 
to be a Numerical Python developer, I get to accept my own patches :-). The same 
patch was previously accepted by numarray.

About the array interface, my feeling is that while it may be helpful in the 
short run, it is likely to damage SciPy in the long run. The array interface 
allows different array implementations to move in different directions. These 
different implementations will be compatible with respect to the array 
interface, but incompatible otherwise (depending on the level of self-restraint 
of the developers of the different array implementations). So in the end, 
extension modules will be written for a specific array implementation anyway. At 
this point, Numerical Python is the most established and has most users. 
Numarray, as far as I can tell, keeps closer to the Numerical Python tradition, 
so maybe extension modules can work with either one without further modification 
(e.g., pygist seems to work with both Numerical Python and numarray). But SciPy 
has been moving away (e.g. by replacing functions by methods). As extension 
module writers are usually busy people, they may not be willing to modify their 
code so that it works with SciPy, and even less to maintain two versions of 
their code, one for Numerical Python/numarray and one for SciPy. Users who could 
previously choose to install SciPy as an addition to Numerical Python, now find 
that they have to choose between SciPy and Numerical Python. As Numerical Python 
has many more extension packages, I expect that SciPy will end up losing users.

Personally I use Numerical Python, and I plan to continue to use it for years to 
come, so it doesn't matter much to me. I'm just warning that the array interface 
may be a Trojan horse for the SciPy project.

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From oliphant at ee.byu.edu  Tue Apr  5 22:26:38 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Tue Apr  5 22:26:38 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <4253597F.1090501@ims.u-tokyo.ac.jp>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp>
Message-ID: <425372A4.7020900@ee.byu.edu>

Michiel Jan Laurens de Hoon wrote:

> Travis Oliphant wrote:
>
>> I'd like to release a Numeric 24.0  to get the array interface out 
>> there.   There are also some other bug fixes in Numeric 24.0
>
>
>
> About the array interface, my feeling is that while it may be helpful 
> in the short run, it is likely to damage SciPy in the long run. 


Well, I guess we'll just have to see.   Again, I see the array interface 
as important for talking to other modules that may not need or want the 
"full power" of a packed array module like scipy.base is. 

> The array interface allows different array implementations to move in 
> different directions. These different implementations will be 
> compatible with respect to the array interface, but incompatible 
> otherwise (depending on the level of self-restraint of the developers 
> of the different array implementations). So in the end, extension 
> modules will be written for a specific array implementation anyway. At 
> this point, Numerical Python is the most established and has most 
> users. Numarray, as far as I can tell, keeps closer to the Numerical 
> Python tradition, so maybe extension modules can work with either one 
> without further modification (e.g., pygist seems to work with both 
> Numerical Python and numarray). 

> But SciPy has been moving away (e.g. by replacing functions by methods). 


Michiel, you seem to want to create this impression that "SciPy" is 
"moving away."  I'm not sure of your motivations.   But, since this is a 
public forum, I have to restate emphatically, that "SciPy" is not 
"moving away from Numeric."  It is all about bringing together the 
communities.  For the 5 years that scipy has been in development, it has 
always been about establishing a library of common routines that we 
could all share.   It has built on Numeric from the beginning.  Now, 
there is another "library" of routines that is developing around 
numarray.  It is this very real break that I'm trying to help fix.   I 
have no other "desire" to "move away" or "create a break"  or any other 
such notions that you seem to want to spread.   

That is precisely why I have publically discussed practically every step 
of my work.    You seem to be the only vocal one who thinks that 
scipy.base is not just a replacement for Numeric, but something else 
entirely.    So, I repeat:  **scipy.base is just a new version of 
Numeric with a few minor compatibility issues and a lot of added 
functionality and features**

For example,  despite your claims,  I have  not "replaced" functions by 
methods.  The  functions are still all there just like before.   I've 
simply noticed that numarray has a lot of methods and so I've added 
similar methods to the Numeric object to help numarray users make the 
transition back.

Everything else that I've changed, I've done to bring Numeric up-to-date 
with modern Python versions, and to fix old warts that have sat around 
for years.   If there are problems with my changes, speak up.    Tell me 
what to do to make the new Numeric better.  

> As extension module writers are usually busy people, they may not be 
> willing to modify their code so that it works with SciPy, and even 
> less to maintain two versions of their code, one for Numerical 
> Python/numarray and one for SciPy. 


It's comments like this that make me wonder what you are thinking.  It 
seems to me that you are the only one I've talked to that wants to 
maintain the notion of a "split".  Everybody else, I'm in contact with 
is in full support of merging the two communities behind a single 
scientific array object. 

Every extension module that compiles for Numeric should compile for 
scipy.base.   Notice that full scipy already has a huge number of 
extension modules that needs to compile for scipy.base.   So, I have 
every motivation to make that a painless process.   

> Users who could previously choose to install SciPy as an addition to 
> Numerical Python, now find that they have to choose between SciPy and 
> Numerical Python. As Numerical Python has many more extension 
> packages, I expect that SciPy will end up losing users.


Again,  scipy.base should *replace* Numerical Python for all users 
(except the most adamant who don't seem to want to go with the rest of 
the community).  scipy.base is a new version of Numeric.   On the 
C-level I don't know of any incompatibilities,  on the Python level 
there are a very few (most of them rarely-used typecode character issues 
which a simple search and replace will fix).  


I should emphasize this next point, since I don't seem to be coming 
across very clearly to some people.   As head Numeric developer,  I'm 
stating that **Numeric 24 is the last release that will be called 
Numeric**.   New releases of Numeric will be called scipy.base.  


Of course, I realize that people can do whatever they want with the old 
Numeric code base, but then they will be the ones responsible for 
continuing a "split," because the Numerical Python project at 
sourceforge will point people to install scipy.base.


Help me make the transition as painless as possible, that's all I'm 
asking.   People transitioning from Numeric should have no trouble at 
all as I repeatedly point out.  People transitioning from numarray will 
have a *little* harder time which is why the array interface should help 
out during that process.  It is helping people transition back from 
numarray that is 90% of the reason I've made any changes to the 
internals of Numeric.


I've been a happy and quiet Numeric user and developer for years, but I 
respect the problems that Perry, Rick, Paul, and Todd have pointed out 
with their numarray implementation, and I saw a way to support their 
needs inside of Numeric.  That is the whole reason for my efforts.  I 
wish people would stop trying to make it seem to casual readers of this 
forum that I'm trying to create a "whole new" incompatible system.   
Help me fix the obviously unnecessary incompatibilites where they may 
exist, and help me make automatic transistion scripts to help people 
upgrade painlessly to the newer Numeric. 


I very much appreciate all who voice your concerns.   Michiel, you are 
particularly appreciated because you are voice from a solid Numeric 
user.   I just think that such concerns would be more productive in the 
context of accepting the fact that an upgrade from Numeric to scipy.base 
is going to happen, rather than trying to make it look like some new 
"split" is occurring.    I've received a lot of offline support for the 
Numeric/numarray unification effort that scipy.base is.   It would help 
if  more people could provide public support on this forum so that 
others can see that I'm not just some outsider pushing some random 
ideas, but I am simply someone who decided to sacrifice some time for 
what I think is a very important effort.    It would also help if other 
people who have concerns would voice them (I'm very grateful for those 
who have expressed their concerns) so that we can all address them and 
get on the same page for future development. 


Right now, the CVS version of Numeric3 works reasonably.  It compiles 
and uses the old ufunc objects (which have only been extended to support 
the new types).   I could use a lot of help in finding bugs.    You can 
also try out the new array scalars to see how they work (math works on 
them now) and also see what may still be missing in their implementation.

>
> Personally I use Numerical Python, and I plan to continue to use it 
> for years to come, so it doesn't matter much to me. I'm just warning 
> that the array interface may be a Trojan horse for the SciPy project.


As long as you realize that as far as I know the other developers of 
Numerical Python are going to be moving to scipy.base, and so you will 
be using obsolete technology, you are free to do as you wish.   But, I 
really hope we can persuade you to join us.  It is much better if we 
work together. 


-Travis


From Fernando.Perez at colorado.edu  Tue Apr  5 22:43:33 2005
From: Fernando.Perez at colorado.edu (Fernando Perez)
Date: Tue Apr  5 22:43:33 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <425372A4.7020900@ee.byu.edu>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu>
Message-ID: <42537690.5040400@colorado.edu>

Travis Oliphant wrote:
> Michiel Jan Laurens de Hoon wrote:

>>But SciPy has been moving away (e.g. by replacing functions by methods). 
> 
> 
> 
> Michiel, you seem to want to create this impression that "SciPy" is 
> "moving away."  I'm not sure of your motivations.   But, since this is a 
> public forum, I have to restate emphatically, that "SciPy" is not 
> "moving away from Numeric."  It is all about bringing together the 
> communities.  For the 5 years that scipy has been in development, it has 
> always been about establishing a library of common routines that we 
> could all share.   It has built on Numeric from the beginning.  Now, 
> there is another "library" of routines that is developing around 
> numarray.  It is this very real break that I'm trying to help fix.   I 
> have no other "desire" to "move away" or "create a break"  or any other 
> such notions that you seem to want to spread.   

FWIW, I think you (Travis) have been exceedingly clear in explaining this 
process, and in pointing out how this is:

a) NOT a further split, but rather the EXACT OPPOSITE (numarray users will 
have a transition path back into a project which will provide the best of the 
old Numeric, along with all the critical enhancements which Perry, Todd et al. 
added to numarray).

b) a way, via the array protocol, to provide third-party low-level libraries 
an easy way to, AT THE C LEVEL, interact easily and efficiently (without 
unnecessary copies) with numeri* arrays.

I fail to see where Michiel gets his split/Trojan horse arguments, or what 
line of reasoning can connect your detailed explanations with such a 
conclusion.  In particular, the comments on the whole 'trojan' issue seem to 
me absolutely unfounded.  Nobody in their sane mind will use this protocol to 
invent a scipy.base competitor, which most likely would end up (if done right) 
  being simply a copy.  What it provides is a minimal, compact, low-level API 
which will be a huge boon for interoperability with things like PIL, WX or 
other simliar libraries.  This protocol has been extensively debated, and 
Scott's extensive comments have made this discussion a very productive one 
(along with the help of others, of course).  I can only see this as a GREAT 
step forward for numerical python support and reliability 'in the wild'.

I hesitated to send this message, but since you (Travis) have sunk an enormous 
amount of your time into this effort, which I can only applaud and rejoice in, 
I figure the least I can do is contribute a little to dispel some unnecessary 
confusion.  Users with less knowledge of the details may become afraid of 
using Python for scientific computing by reading Michiel's comments, which I 
think would be a shame.

Michiel, please note that none of what I said is meant to be a personal 
attack.  I simply feel it is necessary to clarify, in no uncertain terms, how 
your recent comments of impending doom are unfounded.

Best to all, and again thanks to Travis for this much needed hard work,

f


From Chris.Barker at noaa.gov  Tue Apr  5 23:59:31 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Tue Apr  5 23:59:31 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <425372A4.7020900@ee.byu.edu>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu>
Message-ID: <42538880.7010301@noaa.gov>

Travis Oliphant wrote:
>  It would help 
> if  more people could provide public support on this forum

Easy enough. I. for one am very happy about what Travis is doing. It 
seems to be exactly what is needed to mend the Numeric-numarray split, 
which has been an annoyance for a couple years now. I'm also VERY happy 
about the proposed array protocol. While I suppose it could facilitate 
the creation of other array packages, that is only speculation, and 
unlikely, in my judgment. What is I'm quite sure is going to happen is 
that other packages that do not provide an array implementation will be 
able to efficiently take arrays as input without crating a dependence on 
any particular package. I intend to make sure wxPython can efficiently 
take Numeric24 arrays, for instance. (Now that I think about it, it 
would be great if we could get this into wxPython2.6, which will be out 
pretty darn soon. I'm very pressed for time right now..can anyone help?)

 > It would also help if other
> people who have concerns would voice them (I'm very grateful for those 
> who have expressed their concerns) so that we can all address them and 
> get on the same page for future development.

My only concern is versioning. Particularly when under rapid development 
(but really this applies anytime), I'd really love to be able to have 
more than one version of Numeric (or SciPy.base, or whatever) installed 
at once, and be able to select which one is used at runtime, in code 
(before importing the first time, of course). This would facilitate 
testing, but also allow me to have a working environment for older apps 
that will continue to work, without modification or re-compiling, after 
installing a newer version.

Something like wxPython's wxversion is what I have in mind.

http://wiki.wxpython.org/index.cgi/MultiVersionInstalls

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                      		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From magnus at hetland.org  Wed Apr  6 00:30:48 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Wed Apr  6 00:30:48 2005
Subject: [Numpy-discussion] Possible example application of the array interface
Message-ID: <20050406072854.GA12700@idi.ntnu.no>

I was just thinking about some experimental designs, and whether I
could, perhaps, do the statistics in Python. I remembered having used
RPy [1] briefly at some time (there may be other similar bindings out
there -- I don't remember) and started thinking about whether I could,
perhaps, combine it with numpy in some way. My first thought was to
reimplement the relevant statistical functions; then I thought about
how to convert data back and forth -- but then it occurred to me that
R also uses arrays extensively, and that it could, perhaps, be
possible to expose those (through something like RPy) through the
array interface/protocol!

This would be (IMO) a good example of the benefits of the array
protocol; it's not a matter of "getting yet another array module". RPy
is an external library/language with *lots* of features that might be
useful to numpy users, many of which aren't likely to be implemented
in Python for quite a while, I'd guess (unless, perhaps, someone
writes a translator from R, which I'm sure is doable).

I don't know enough (at least yet ;) about the implementation of RPy
and the R library to say for sure whether this would even be possible,
but it does seem like it could be really useful...

[1] rpy.sf.net

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From sdementen at hotmail.com  Wed Apr  6 00:36:39 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 00:36:39 2005
Subject: [Numpy-discussion] Numeric 24.0
Message-ID: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>

Hi Travis,

Could you look at bug
[ 635104 ] segfault unpickling Numeric 'O' array
[ 567796 ] unpickling of 'O' arrays causes segfault   (duplicate of previous 
one)

I proposed a (rather simple) solution that I put in the comment of bug [ 
635104 ]. But apparently, nobody is looking at those bugs...

>
>I'd like to release a Numeric 24.0  to get the array interface out there.   
>There are also some other bug fixes in Numeric 24.0
>
>Here is the list so far from Numeric 23.7
>
>[Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a is 2-d 
>of Int16

This is quite disturbing. In fact for all types that are not exactly 
equivalent to python type, indexing a multidimensional array (rank > 1) 
return arrays even if the final shape is ().
So
type(zeros((5,2,4), Int8 )[0,0,0])  => <type 'array'>
type(zeros((5,2,4), Int32 )[0,0,0])  => <type 'array'>
type(zeros((5,2), Float32 )[0,0])  => <type 'array'>
But
type(zeros((5,2,4), Int )[0,0,0])  => <type 'int'>
type(zeros((5,2,4), Float64)[0,0,0])  => <type 'float'>
type(zeros((5,2,4), Float)[0,0,0])  => <type 'float'>
type(zeros((5,2,4), PyObject)[0,0,0])  => <type 'int'>

Notice too the weird difference betweeb Int <> Int32 and Float == Float64.

However, when indexing a onedimensional array (rank == 1), then we get back 
scalar for indexing operations on all types.

So, when you say "return the same type", do you think scalar or array (it 
smells like a recent discussion on Numeric3 ...) ?

>[unreported]  Added array interface
>[unreported]  Allow Long Integers to be used in slices
>[1123145]     Handle mu==0.0 appropiately in ranlib/ignpoi.
>[unreported]  Return error info in ranlib instead of printing it to stderr
>[1151892]     dot() would quit python with zero-sized arrays when using
>               dotblas. The BLAS routines *gemv and *gemm need LDA >= 1.
>[unreported]  Fixed empty for Object arrays
>
>Version 23.8  March 2005
>[Cooke]       Fixed more 64-bit issues (patch 117603)
>[unreported]  Changed arrayfnsmodule back to PyArray_INT where the code
>               typecasts to (int *).  Changed CanCastSafely to check
>               if sizeof(long) == sizeof(int)
>
>
>I'll wait a little bit to allow last minute bug fixes to go in, but I'd 
>realy like to see this release get out there.  For users of Numeric  >23.7 
>try
>Numeric.empty((10,20),'O')  if you want to see an *interesting* bug that is 
>fixed in CVS.
>
>-Travis
>
>


From nwagner at mecha.uni-stuttgart.de  Wed Apr  6 01:01:42 2005
From: nwagner at mecha.uni-stuttgart.de (Nils Wagner)
Date: Wed Apr  6 01:01:42 2005
Subject: [Numpy-discussion] errors=31 in scipy.test() with latest cvs versions of scipy and Numerical
Message-ID: <42539706.3000503@mecha.uni-stuttgart.de>

Hi all,

Using Numeric 24.0
 >>> scipy.__version__
'0.3.3_303.4599'

scipy.test() results in

======================================================================
ERROR: check_simple_todense (scipy.io.mmio.test_mmio.test_mmio_coordinate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/scipy/io/tests/test_mmio.py", 
line 152, in check_simple_todense
    b = mmread(fn).todense()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
254, in todense
    csc = self.tocsc()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
1437, in tocsc
    return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1])
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_add (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_getelement (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matvec (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_setelement (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocsc (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocsr (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_todense (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor1 (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor2 (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor3 (scipy.sparse.Sparse.test_Sparse.test_csc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_add (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_getelement (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matvec (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_setelement (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocsc (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocsr (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_todense (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor1 (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor2 (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_constructor3 (scipy.sparse.Sparse.test_Sparse.test_csr)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 30, in setUp
    self.datsp = self.spmatrix(self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
712, in __init__
    ocsc = csc_matrix(transpose(s))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_dok)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 60, in check_elmul
    c = a ** b
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
186, in __pow__
    return csc ** other
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
485, in __pow__
    return csc_matrix(c,(rowc,ptrc),M=M,N=N)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_dok)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 71, in check_matmat
    
assert_array_almost_equal((asp*bsp).todense(),dot(asp.todense(),bsp.todense()))
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
1184, in __mul__
    return self.matmat(other)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
239, in matmat
    res = csc.matmat(other)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
568, in matmat
    return csc_matrix(c, (rowc, ptrc), M=M, N=N)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_dok)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 75, in check_tocoo
    assert_array_almost_equal(a.todense(),self.dat)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
254, in todense
    csc = self.tocsc()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
1437, in tocsc
    return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1])
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

======================================================================
ERROR: check_mult (scipy.sparse.Sparse.test_Sparse.test_dok)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", 
line 155, in check_mult
    D = A*A.T
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
1184, in __mul__
    return self.matmat(other)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
239, in matmat
    res = csc.matmat(other)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
568, in matmat
    return csc_matrix(c, (rowc, ptrc), M=M, N=N)
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
357, in __init__
    self._check()
  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
375, in _check
    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
IndexError: invalid slice

----------------------------------------------------------------------
Ran 1173 tests in 3.113s

FAILED (errors=31)
<unittest.TextTestRunner object at 0x40f6a1ac>
 >>>

 
From cookedm at physics.mcmaster.ca  Wed Apr  6 02:23:11 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 02:23:11 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
References: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
Message-ID: <20050406092143.GA31688@arbutus.physics.mcmaster.ca>

On Wed, Apr 06, 2005 at 07:33:56AM +0000, S?bastien de Menten wrote:
> 
> Hi Travis,
> 
> Could you look at bug
> [ 635104 ] segfault unpickling Numeric 'O' array
> [ 567796 ] unpickling of 'O' arrays causes segfault   (duplicate of 
> previous one)
> 
> I proposed a (rather simple) solution that I put in the comment of bug [ 
> 635104 ]. But apparently, nobody is looking at those bugs...

This is too true. Travis added myself and Michiel de Hoon recently to
the developers, so there's some new blood, and we've been banging on
things, though. I'll have a look at it if I've got time. I personally
really hate bugs that crash my interpreter :-)

> >I'd like to release a Numeric 24.0  to get the array interface out there.  
> >There are also some other bug fixes in Numeric 24.0
> >
> >Here is the list so far from Numeric 23.7
> >
> >[Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a is 
> >2-d of Int16
> 
> This is quite disturbing. In fact for all types that are not exactly 
> equivalent to python type, indexing a multidimensional array (rank > 1) 
> return arrays even if the final shape is ().
> So
> type(zeros((5,2,4), Int8 )[0,0,0])  => <type 'array'>
> type(zeros((5,2,4), Int32 )[0,0,0])  => <type 'array'>
> type(zeros((5,2), Float32 )[0,0])  => <type 'array'>
> But
> type(zeros((5,2,4), Int )[0,0,0])  => <type 'int'>
> type(zeros((5,2,4), Float64)[0,0,0])  => <type 'float'>
> type(zeros((5,2,4), Float)[0,0,0])  => <type 'float'>
> type(zeros((5,2,4), PyObject)[0,0,0])  => <type 'int'>

> Notice too the weird difference betweeb Int <> Int32 and Float == Float64.

That's because Int is *not* Int32. Int32 is the first typecode of '1sil'
that has 32 bits. For (all?) platforms I've seen, that'll be 'i'.

Int corresponds to a Python integer, and Float corresponds to a Python
float. Now, a Python integer is actually a C long, and a Python float
is actually a C double. I've made a table:

Numeric type    typecode    Python type     C type      Array type
Int             'l'         int             long        PyArray_LONG
Int32           'i' [1]     N/A             int         PyArray_INT
Float           'd'         float           double      PyArray_DOUBLE
Float32         'f'         N/A             float       PyArray_FLOAT
Float64         'd'         float           double      PyArray_DOUBLE

[1] assuming sizeof(int)==4, which is true on most platforms. There are
some 64-bit platforms where this won't be true, I think.

On (all? most?) 32-bit platforms, sizeof(int) == sizeof(long) == 4, so
both Int and Int32 be 32-bit quantities. Not so on some 64-bit platforms
(Linux on an Athlon 64, like the one I'm typing at now), where
sizeof(long) == 8.

I've been fixing oodles of assumptions in Numeric where ints and longs
have been used interchangeably, hence the extended discussion :-)

[I haven't addressed here why you get an array sometimes and a
Python type the others. This is the standard, old, behaviour -- it's
likely not going to change in Numeric. Whether it's a *good* thing is
another question. scipy.base and numarray do it differently.]

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From cookedm at physics.mcmaster.ca  Wed Apr  6 02:46:55 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 02:46:55 2005
Subject: [Numpy-discussion] errors=31 in scipy.test() with latest cvs versions of scipy and Numerical
In-Reply-To: <42539706.3000503@mecha.uni-stuttgart.de>
References: <42539706.3000503@mecha.uni-stuttgart.de>
Message-ID: <20050406094438.GA32297@arbutus.physics.mcmaster.ca>

On Wed, Apr 06, 2005 at 10:00:06AM +0200, Nils Wagner wrote:
> Hi all,
> 
> Using Numeric 24.0
> >>> scipy.__version__
> '0.3.3_303.4599'
> 
> scipy.test() results in
> 
> ======================================================================
> ERROR: check_simple_todense (scipy.io.mmio.test_mmio.test_mmio_coordinate)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "/usr/lib/python2.3/site-packages/scipy/io/tests/test_mmio.py", 
> line 152, in check_simple_todense
>    b = mmread(fn).todense()
>  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
> 254, in todense
>    csc = self.tocsc()
>  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
> 1437, in tocsc
>    return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1])
>  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
> 357, in __init__
>    self._check()
>  File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 
> 375, in _check
>    if (nnz>0) and (max(self.rowind[:nnz]) >= M):
> IndexError: invalid slice

(etc. -- note to self: use scipy for regression testing :-)

nnz is coming from
nnz = self.indptr[-1]
where self.indptr is an array of Int32.

Hmm, this corresponds to the behaviour I just responded to Sebastien de
Menten about. The problem is that nnz is *not* an Python integer; it's
an array, so the slice fails.

I think I was wrong in that email about saying this was expected
behaviour :-)

This comes from the recent fix of a[0,0] and a[0][0] returning
the same type. Either change that back, or else we need to spruce up the
slicing logic to consider 0-dimensional integer arrays as scalars.

A minimal test case:

a = Numeric.array([5,6,7,8])
b = Numeric.array([0,1,2,3], 'i')
n = b[-1]
assert a[:n] == 8

(I'm not tackling this right now)

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From magnus at hetland.org  Wed Apr  6 02:59:18 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Wed Apr  6 02:59:18 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050405203434.38638.qmail@web50204.mail.yahoo.com>
References: <20050405203434.38638.qmail@web50204.mail.yahoo.com>
Message-ID: <20050406095639.GA16810@idi.ntnu.no>

Scott Gilbert <xscottg at yahoo.com>:
>
> 
> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
> > 
> > Do we really have to break backward compatibility in order to add more
> > dimensions to the array module?
> > 
> 
> You're right.  The Python array module could change in a backwards
> compatible way.  Possibly using keyword arguments to specify parameters
> that have never been there before.
> 
> We could probably make sense out of array.insert(), array.append(),
> array.extend(), array.pop(), and array.reverse() by giving those an "axis"
> keyword.  Even array.remove() could be made to work for more dimensions,
> but it probably wouldn't get used often.  Maybe some of these would just
> raise an exception for ndims > 1.

Sure. I guess basically the extend/pop/reverse/etc. methods and the
ndim-functionality would sort of be two quite different ways of using
arrays, so keeping them mutually exclusive doesn't seem like a problem
to me.

This might speak in favour of separating the functionality into two
different classes, but I think there's merit to keeping it gathered,
because this is partly for basic use(rs) who just want to get an array
and do things to it that make sense. Appending to a multidimensional
array (as long as we don't tempt them with an axis keyword) just
doesn't make sense -- so people (hopefully) won't do it.

> Then we'd have to add some additional typecodes for complex and a
> few others.

Yeah; the question is how compatible the typecode system is with the
new array protocol -- some overlap and some differences, I believe
(without checking right now)?

So -- this might look a bit like patchwork. But I think might get that
if we have two modules (or classes) too -- one, called array, with the
existing functionality, and one, called (e.g.) ndarray, with a similar
but incompatible interface... It *may* be better, but I'm not quite
sure I think so.

In my experience (which may be very biased and selective here ;) the
array module isn't exactly among the "hottest" features of Python or
the standard libs. In fact, it seems almost a bit pointless to me. It
claims to have "efficient arrays of numeric values" but is the
efficiency really that great, if you write your code in Python? (Using
lists and psyco would, quite possibly, be just as good, for example.)

So -- at *least* adding the array protocol to it would be doing it a
favour, i.e., making it a useful module, and sort of a prototypical
example of the protocol and such. Adding more dimensions might simply
make it more useful. (I've many times been asked by people how to
create e.g. two-dimensional arrays in Python. It would be nice if
there was actually some basic support for it.)

> Under the hood, it would basically be a complete reimplementation,

Sure; except for the (possibly minor?) work involved, I don't see that
this is a problem? (Well... The inherent instability of new code,
perhaps... But still.)

> but maybe that is the way to go...  It does keep the number of array
> modules down.

Yes.

> I wonder which way would meet less resistance in getting accepted in
> the core. I think creating a new ndarray object would be less risk
> of breaking existing applications.

I guess that's true.

> >
> > There may be some issues with, e.g., typecode, but still...
> >
> 
> The .typecode attribute could return the same values it always has.

Sure. But we might end up with, e.g., a constructor that looks almost
exactly like the numpy array() constructor -- but whose typecodes are
different... :/

> The .__array_typestr__ attribute would return the new style values.
> That's confusing, but probably unavoidable.

Yes, if we do use this approach.

If we only allow one-dimensional arrays here (i.e., only add the
protocol to the existing functionality) there might be less confusion?

Oh, I don't know. Having a separate module or class/type might be just
as good an idea. Perhaps I'm just being silly :->

> It would be nice if there was only one set of typecodes for all of
> Python,

Yeah -- or some similar system (using type objects).

> but I think we're stuck with many (array module typecores, struct
> module typecodes, array protocol typecodes). 

:(

Yes, lots of history here. Oh, well. Not the greatest of problems, I
guess.

But using different typecodes in the explicit user-part of the
ND-array interface in the stdlibs from those in scipy, for example,
seems like a decidedly Bad Idea(tm). So ... that might be a good
enough reason for using a separate ndarray entity, unless there can be
some upward compatibility somehow.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From sdementen at hotmail.com  Wed Apr  6 03:12:32 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 03:12:32 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
Message-ID: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>

Hi,

I follow with great interest the threads around Numeric3/scipy.base.
As Travis suggested (?It would also help if other people who have concerns 
would voice them (I'm very grateful for those who have expressed their 
concerns) so that we can all address them and get on the same page for 
future development.?), I voice my concert J

Sometimes it is quite useful to treat data at a higher level than just an 
?array of number of some types?. Adding metadata to array (I called them 
?augmented arrays?) is a simple way to add sense to an array. I see 
different user cases like:
1)	attaching a physical unit to array data (see for instance Unum 
http://home.tiscali.be/be052320/Unum.html )
	2) description of axis (see 
http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very useful 
to manipulate easily time series.
	3) masked arrays as in MA module of Numeric
	4) arrays for interval arithmetic where one keep another array with 
precision of data
	5) record arrays (currently being integrated in scipy.base as a base type)

The current solution for those situation is nicely summarized by quoting 
Konrad
?but rather a class written using arrays than a variety of the basic array 
type.
It?s actually pretty straightforward to implement, the most difficult choice 
being the form of the constructor that gives most flexibility in use.?

However, I disagree with the ?pretty straightforward to implement?. In fact, 
if one wants to inherit most of the functionalities of Numeric, it becomes 
quite cumbersome. Looking at MA module, I see that it needs to:
1)	redefine all methods (__add__, ?)
2)	redefine all ufuncs
3)	redefine all array functions (like reshape, sort, argmax, ?)
For other purposes, the same burden may apply.

A general solution to this problem is not straightforward and may be out of 
reach (computationally and/or conceptually).
However, a quite-general-enough elegant solution could solve most practical 
problems.

Looking at threads in this list, I think that there is enough brain power to 
get to something usable in the medium term.

An embryo of idea would be to add hooks in the machinery to allow an object 
to interact with an ufunc. Currently, this is done by calling __array__ to 
extract a ?naked array? (== Numeric.array vs ?augmented array?) but the 
result is then always a ?naked array?.
In pseudocode, this looks like:

  def ufunc( augmented_array ):
    if not isarray(augmented_array):
      augmented_array = augmented_array.__array__()
    return ufunc.apply(augmented_array)

where I would prefer something like

  def ufunc( augmented_array ):
    if not isarray(augmented_array):
      augmented_array, contructor = augmented_array.__array_constructor__()
    else:
      constructor = lambda x:x
    return constructor(ufunc.apply(augmented_array))

For array functions and methods, I have even less clues to a solution J. But 
calling hooks specified by some protocol would be a path:
a)	__array_constructor__
b)	__array_binary_op__ (would be called for __add__, __sub__, ?)
c)	__array_rbinary_op__ (would be called for __radd__, __rsub__, ?)

If I miss a point and there is an easy way to do this, I?ll be pleased to 
know it.
Otherwise, any feedback on this ability to easily increase array 
functionalities by appending metadata and related behavior.

Sebastien


From cjw at sympatico.ca  Wed Apr  6 03:15:13 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Wed Apr  6 03:15:13 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <424FE8E7.4040904@ee.byu.edu>
References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu>
Message-ID: <4253B691.5030902@sympatico.ca>

Travis Oliphant wrote:

> Colin J. Williams wrote:
>
>> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install
>> running install
>> running build
>> running config
>> error: The .NET Framework SDK needs to be installed before building 
>> extensions for Python.
>>
>> Is there any chance that a Windows binary could be made available for 
>> testing?
>
>
> Probably not in the near term (but you could ask Michiel).
>
> I'm assuming you have mingw32 installed which would allow you to build 
> it provided you have created an exports file for python2.4 (look on 
> the net for how to compile extensions with mingw32 using a MSVC 
> compiled python).
> You have to tell distutils what compiler to use:
>
> python setup.py config --compiler=mingw32
> python setup.py build --compiler=mingw32
> python setup.py install
>
> -Travis

Thanks to Michiel and Travis for their suggestions.  I am using Windows 
XP and get the following result:

    C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py
    config --compiler=minw32
    running config
    error: don't know how to compile C/C++ code on platform 'nt' with
    'minw32' compiler

    C:\Python24\Lib\site-packages\Numeric3\Download>

I would welcome any comments.

Colin W.


From cookedm at physics.mcmaster.ca  Wed Apr  6 03:31:40 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 03:31:40 2005
Subject: [Numpy-discussion] array interface nitpicks
Message-ID: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>

Just some small nitpicks in the array interface document
(http://numeric.scipy.org/array_interface.html):

As written:
"""
__array_shape__ (required)

Tuple showing size in each dimension. Each entry in the tuple must be
a Python (long) integer. Note that these integers could be larger
than the platform "int" or "long" could hold. Use Py_LONG_LONG if
accessing the entries of this tuple in C.
"""

Since this is supposed to be an interface, not an implementation
(duck-typing and all that), I think this is too strict:
__array_shape__ should just be a sequence of integers, not necessarily
a tuple. I'd suggest something like this:

'''
__array_shape__ (required)

Sequence whose elements are the size in each dimension. Each entry is
an integer (a Python int or long). Note that these integers could be
larger than the platform "int" or "long" could hold (a Python int is a
C long). It is up to the calling code to handle this appropiately;
either by raising an error when overflow is possible, or by using
Py_LONG_LONG as the C type for the shapes.
'''

This is clearer about the users responsibility -- note that Numeric
is taking the first approach (error), as the dimensions in
PyArrayObject are ints.

Similiar comments about __array_strides. I'd reword it along the lines
of

'''
__array_strides__ (optional)

Sequence of strides which provides the number of bytes needed to jump
to the next array element in the corresponding dimension. Each entry
must be integer (a Python int or long). As with __array_shape__, the
values may be larger than can be represented by a C "int" or "long";
the calling code should handle this appropiately, either by raising an
error, or by using Py_LONG_LONG in C.
Default is a strides tuple which implies a C-style contiguous memory
buffer. In this model, the last dimension of the array varies the
fastest. For example, the default __array_strides__ tuple for an
object whose array entries are 8 bytes long and whose __array_shape__
is (10,20,30) would be (4800, 240, 8)
Default: C-style contiguous
'''

I'm mostly worried about the use of Python longs; it shouldn't be
necessary in almost all cases, and adds extra complications (in normal
usage, you don't see Python longs all that much).

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From cjw at sympatico.ca  Wed Apr  6 03:33:05 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Wed Apr  6 03:33:05 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
Message-ID: <4253BAA1.7010403@sympatico.ca>

S?bastien de Menten wrote:

> Hi,
>
> I follow with great interest the threads around Numeric3/scipy.base.
> As Travis suggested (?It would also help if other people who have 
> concerns would voice them (I'm very grateful for those who have 
> expressed their concerns) so that we can all address them and get on 
> the same page for future development.?), I voice my concert J
>
> Sometimes it is quite useful to treat data at a higher level than just 
> an ?array of number of some types?. Adding metadata to array (I called 
> them ?augmented arrays?) is a simple way to add sense to an array. I 
> see different user cases like:
> 1) attaching a physical unit to array data (see for instance Unum 
> http://home.tiscali.be/be052320/Unum.html )
> 2) description of axis (see 
> http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very 
> useful to manipulate easily time series.

Does the record array provide a means of addressing this need?

> 3) masked arrays as in MA module of Numeric
> 4) arrays for interval arithmetic where one keep another array with 
> precision of data
> 5) record arrays (currently being integrated in scipy.base as a base 
> type)
>
Yes, and there is numarray's array of objects.

> The current solution for those situation is nicely summarized by 
> quoting Konrad
> ?but rather a class written using arrays than a variety of the basic 
> array type.
> It?s actually pretty straightforward to implement, the most difficult 
> choice being the form of the constructor that gives most flexibility 
> in use.?
>
[snip]

Colin W.


From rkern at ucsd.edu  Wed Apr  6 03:36:51 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Wed Apr  6 03:36:51 2005
Subject: [Numpy-discussion] The array interface published
In-Reply-To: <20050406095639.GA16810@idi.ntnu.no>
References: <20050405203434.38638.qmail@web50204.mail.yahoo.com> <20050406095639.GA16810@idi.ntnu.no>
Message-ID: <4253BB73.5000605@ucsd.edu>

Magnus Lie Hetland wrote:

> So -- at *least* adding the array protocol to it would be doing it a
> favour, i.e., making it a useful module, and sort of a prototypical
> example of the protocol and such. Adding more dimensions might simply
> make it more useful. (I've many times been asked by people how to
> create e.g. two-dimensional arrays in Python. It would be nice if
> there was actually some basic support for it.)

Re-implementing the stdlib-array module to support multiple dimensions 
is almost certainly a non-starter. You can't easily do it without 
breaking its pre-allocation strategy. It preallocates memory for 
elements using the same algorithm that lists do, so .append() has 
reasonable amortized time behaviour.

python-dev will not appreciate changing the algorithmic complexity of a 
long-existing component to accomodate a half-arsed implementation of N-D 
arrays.

OTOH, it is the one reason for stdlib-array's use in a Numeric world: 
sometimes, you just need to append values; you can't pre-allocate with 
Numeric.empty() and index in values. Using stdlib-array to collect the 
values, then using the buffer interface (soon-to-be __array__ interface) 
to convert to a Numeric array is faster than the alternatives.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From sdementen at hotmail.com  Wed Apr  6 03:59:35 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 03:59:35 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or
 Numeric3)
In-Reply-To: <4253BAA1.7010403@sympatico.ca>
Message-ID: <BAY103-F20FA88B55E08686CDEB8F6A43D0@phx.gbl>

>>1) attaching a physical unit to array data (see for instance Unum 
>>http://home.tiscali.be/be052320/Unum.html )
>>2) description of axis (see 
>>http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very 
>>useful to manipulate easily time series.
>
>Does the record array provide a means of addressing this need?
>

Not really, when I mean axis, I speak about indexing.
For an array (named a) with shape (10, 5, 33), I would like to attach 3 
arrays or list or tuple (named axis_information[0], axis_information[1] and 
axis_information[2])  of size (10,), (5,) and (33,)  which give sense to the 
first, second and third index.
For instance,
A[i,j,k] => means the element of A at (axis_information[0][i], 
axis_information[1][j], axis_information[2][k])
instead of
A[i,j,k] => means the element of A at index position [i,j,k] which makes 
less sense (you always need to track the meaning of i,j,k in parallel).

>>3) masked arrays as in MA module of Numeric

Maybe this one could be implemented using record array with a record like 
(data, mask).
However, it would be cumbersome to use.
E.g.  a.field("data")[:] = cos( a.field("data")[:] )
instead of
a[:] = cos(a[:])
with the current MA module

>>4) arrays for interval arithmetic where one keep another array with 
>>precision of data
>>5) record arrays (currently being integrated in scipy.base as a base type)
>>
>Yes, and there is numarray's array of objects.
>

This is overkilling as it eats way too much memory.
E.g. your data represents instantaneous speeds and so it tagged with a "m/s" 
information (a complex object) valid for the full array. Distributing this 
information to each component of an array via an array object is not 
practical.


From mdehoon at ims.u-tokyo.ac.jp  Wed Apr  6 04:22:52 2005
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Wed Apr  6 04:22:52 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <4253B691.5030902@sympatico.ca>
References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca>
Message-ID: <4253C73E.4030703@ims.u-tokyo.ac.jp>

Colin J. Williams wrote:
> Thanks to Michiel and Travis for their suggestions.  I am using Windows 
> XP and get the following result:
> 
>    C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py
>    config --compiler=minw32
>    running config
>    error: don't know how to compile C/C++ code on platform 'nt' with
>    'minw32' compiler
> 
>    C:\Python24\Lib\site-packages\Numeric3\Download>
> 
> I would welcome any comments.

--mingw32 contains a 'g'.
Also, make sure you have Cygwin installed, with all the necessary packages.

--Michiel.


-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon


From steve at shrogers.com  Wed Apr  6 05:12:39 2005
From: steve at shrogers.com (Steven H. Rogers)
Date: Wed Apr  6 05:12:39 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <425372A4.7020900@ee.byu.edu>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu>
Message-ID: <4253D1B9.90709@shrogers.com>


Travis Oliphant wrote:
> 
> Again,  scipy.base should *replace* Numerical Python for all users 
> (except the most adamant who don't seem to want to go with the rest of 
> the community).  scipy.base is a new version of Numeric.   On the 
> C-level I don't know of any incompatibilities,  on the Python level 
> there are a very few (most of them rarely-used typecode character issues 
> which a simple search and replace will fix). 
> 
> I should emphasize this next point, since I don't seem to be coming 
> across very clearly to some people.   As head Numeric developer,  I'm 
> stating that **Numeric 24 is the last release that will be called 
> Numeric**.   New releases of Numeric will be called scipy.base. 
> 

I'm happy with the direction your taking to rejoin Numeric and Numarray. 
However, changing the name from Numeric to scipy.base may contribute to the 
confusion/concern.  Is it really necessary?

Steve
-- 
Steven H. Rogers, Ph.D., steve at shrogers.com
Weblog: http://shrogers.com/weblog
"Reach low orbit and you're half way to anywhere in the Solar System."
-- Robert A. Heinlein


From konrad.hinsen at laposte.net  Wed Apr  6 07:49:45 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Wed Apr  6 07:49:45 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
Message-ID: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net>

On Apr 6, 2005, at 12:10, S?bastien de Menten wrote:

> However, I disagree with the ?pretty straightforward to implement?. In 
> fact, if one wants to inherit most of the functionalities of Numeric, 
> it becomes quite cumbersome. Looking at MA module, I see that it needs 
> to:

It is straightforward AND cumbersome. Lots of work, but nothing 
difficult. I agree of course that it would be nice to improve the 
situation.

> An embryo of idea would be to add hooks in the machinery to allow an 
> object to interact with an ufunc. Currently, this is done by calling 
> __array__ to extract a ?naked array? (== Numeric.array vs ?augmented 
> array?) but the result is then always a ?naked array?.
> In pseudocode, this looks like:
>
>  def ufunc( augmented_array ):
>    if not isarray(augmented_array):
>      augmented_array = augmented_array.__array__()
>    return ufunc.apply(augmented_array)

The current behaviour of Numeric is more like

	def ufunc(object):
		if isarray(object):
			return array_ufunc(object)
		elif is_array_like(object):
			return array_func(array(object))
		else:
			return object.ufunc()

A more general version, which should cover your case as well, would be:

	def ufunc(object):
		if isarray(object):
			return array_ufunc(object)
		else:
			try:
				return object.applyUfunc(ufunc)
			except AttributeError:
				if is_array_like(object):
					return array_func(array(object))
				else:
					raise ValueError

There are two advantages:

1) Classes can handle ufuncs in any way they like, even if they 
implement
    array-like objects.
2) Classes must implement only one method, not one per ufunc.

Compared to the approach that you suggested:

> where I would prefer something like
>
>  def ufunc( augmented_array ):
>    if not isarray(augmented_array):
>      augmented_array, contructor = 
> augmented_array.__array_constructor__()
>    else:
>      constructor = lambda x:x
>    return constructor(ufunc.apply(augmented_array))

mine has the advantage of also covering classes that are not array-like 
at all.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Laboratoire L?on Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
---------------------------------------------------------------------


From cjw at sympatico.ca  Wed Apr  6 08:16:33 2005
From: cjw at sympatico.ca (cjw at sympatico.ca)
Date: Wed Apr  6 08:16:33 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <BAY103-F20FA88B55E08686CDEB8F6A43D0@phx.gbl>
References: <BAY103-F20FA88B55E08686CDEB8F6A43D0@phx.gbl>
Message-ID: <4253FCD1.2090808@sympatico.ca>

S?bastien de Menten wrote:

>>> 1) attaching a physical unit to array data (see for instance Unum 
>>> http://home.tiscali.be/be052320/Unum.html )
>>> 2) description of axis (see 
>>> http://sourceforge.net/mailarchive/message.php?msg_id=11051806). 
>>> Very useful to manipulate easily time series.
>>
>>
>> Does the record array provide a means of addressing this need?
>>
>
> Not really, when I mean axis, I speak about indexing.

Fair enough, I was thinking one dimensionally.

> For an array (named a) with shape (10, 5, 33), I would like to attach 
> 3 arrays or list or tuple (named axis_information[0], 
> axis_information[1] and axis_information[2])  of size (10,), (5,) and 
> (33,)  which give sense to the first, second and third index.
> For instance,
> A[i,j,k] => means the element of A at (axis_information[0][i], 
> axis_information[1][j], axis_information[2][k])
> instead of
> A[i,j,k] => means the element of A at index position [i,j,k] which 
> makes less sense (you always need to track the meaning of i,j,k in 
> parallel).
>
>>> 3) masked arrays as in MA module of Numeric
>>
>
> Maybe this one could be implemented using record array with a record 
> like (data, mask).
> However, it would be cumbersome to use.
> E.g.  a.field("data")[:] = cos( a.field("data")[:] )
> instead of
> a[:] = cos(a[:])
> with the current MA module

Assuming "data" is the name of a field in a record array "a", why not
have a.data to represent a view (or copy, depending on the convention 
adopted) of a column in a or
a.data.Cos to provide the cosines of the values in the data column?

"Cos" is used in place of "cos" to distinguish the method from the 
function.  The former requires no parentheses.

This assumes that the values in data are of the approriate numerictype ( 
with its appropriate typecode).

Colin W.

>
>
>>> 4) arrays for interval arithmetic where one keep another array with 
>>> precision of data
>>> 5) record arrays (currently being integrated in scipy.base as a base 
>>> type)
>>>
>> Yes, and there is numarray's array of objects.
>>
>
> This is overkilling as it eats way too much memory.
> E.g. your data represents instantaneous speeds and so it tagged with a 
> "m/s" information (a complex object) valid for the full array. 
> Distributing this information to each component of an array via an 
> array object is not practical.
>


From sdementen at hotmail.com  Wed Apr  6 08:52:05 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 08:52:05 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or
 Numeric3)
Message-ID: <BAY103-F98C5A1CCD74EAF8461141A43D0@phx.gbl>

>>
>>Maybe this one could be implemented using record array with a record like 
>>(data, mask). However, it would be cumbersome to use. E.g.  
>>a.field("data")[:] = cos( a.field("data")[:] ) instead of a[:] = cos(a[:]) 
>>with the current MA module
>
>Assuming "data" is the name of a field in a record array "a", why not have 
>a.data to represent a view (or copy, depending on the convention adopted) 
>of a column in a or a.data.Cos to provide the cosines of the values in the 
>data column?
>
>"Cos" is used in place of "cos" to distinguish the method from the 
>function.  The former requires no parentheses.
>

Well, I think the whole point is to be able to use "without changes" any 
library that manipulate arrays with "augmented arrays": same code for all 
arrays independently of them being "naked" or "augmented".

The "without changes" and "any library" should be taken with a pinch of salt 
as operation that are accepted for any array will not necessarily mean 
something for some "augmented arrays".

On a side note, I rather prefer to keep mathematical notation instead of OO 
notation ( cos as function vs method )


From sdementen at hotmail.com  Wed Apr  6 09:07:07 2005
From: sdementen at hotmail.com (S�bastien de Menten)
Date: Wed Apr  6 09:07:07 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or
 Numeric3)
In-Reply-To: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net>
Message-ID: <BAY103-F39227597DD89A7D5D14AFAA43D0@phx.gbl>

>
>>However, I disagree with the ?pretty straightforward to implement?. In 
>>fact, if one wants to inherit most of the functionalities of Numeric, it 
>>becomes quite cumbersome. Looking at MA module, I see that it needs to:
>
>It is straightforward AND cumbersome. Lots of work, but nothing difficult. 
>I agree of course that it would be nice to improve the situation.

My fault, I misunderstood your answer (... but it was a little bit 
misleading :-)


>The current behaviour of Numeric is more like
>
>	def ufunc(object):
>		if isarray(object):
>			return array_ufunc(object)
>		elif is_array_like(object):
>			return array_func(array(object))
>		else:
>			return object.ufunc()
>
>A more general version, which should cover your case as well, would be:
>
>	def ufunc(object):
>		if isarray(object):
>			return array_ufunc(object)
>		else:
>			try:
>				return object.applyUfunc(ufunc)
>			except AttributeError:
>				if is_array_like(object):
>					return array_func(array(object))
>				else:
>					raise ValueError
>
>There are two advantages:
>
>1) Classes can handle ufuncs in any way they like, even if they implement
>    array-like objects.
>2) Classes must implement only one method, not one per ufunc.
>
>Compared to the approach that you suggested:
>
>>where I would prefer something like
>>
>>  def ufunc( augmented_array ):
>>    if not isarray(augmented_array):
>>      augmented_array, contructor = 
>>augmented_array.__array_constructor__()
>>    else:
>>      constructor = lambda x:x
>>    return constructor(ufunc.apply(augmented_array))
>
>mine has the advantage of also covering classes that are not array-like at 
>all.
>

Yes !! That's a elegant solution for the ufunc part.

Do you think it is possible to integrate a similar mechanism in array 
functions (like searchsorted, argmax, ...).

If we can register functions taking one array as argument within scipy.base 
and let it dispatch those functions as ufunc, we could use a similar 
strategy.

For instance, let "sort" and "argmax" be registered as gfunc (general 
functions on an array <> ufunc), then any class that would like to overide 
any of them could do it too with the same trick Konrad exposed here above.

If another function uses those gfuncs and ufuncs, it inherits the genericity 
of the latter.

Konrad, do you think it is tricky to have a prototype of your suggestion 
(i.e. the modification does not need a full understanding of Numeric and you 
can locate it approximately in the source code) ?

Seb

>Konrad.
>--


From mike_lists at yahoo.com.au  Wed Apr  6 10:12:39 2005
From: mike_lists at yahoo.com.au (Michael Sorich)
Date: Wed Apr  6 10:12:39 2005
Subject: [Numpy-discussion] Possible example application of the array interface
In-Reply-To: 6667
Message-ID: <20050406171008.58480.qmail@web53602.mail.yahoo.com>

I think that this is a great idea! While I have a
strong preference for python, I generally use R for
statistical analyses due to the large number of mature
libraries available. There are also some aspects of
the R data types (eg data-frames and column/row names
for 2D arrays) that are really nice for spreadsheet
like data. I hope that scipy.base record arrays will
be as easily manipulated as data-frames are. 

While RPy works well for small simple problems, there
are data conversion limitations between R and Python.
If one could efficiently convert between the major R
data types and python scipy.base data types without
loss of data, it would become possible to do most of
the data manipulation in python and freely mix in R
functions when required. This may encourage the use of
python for the development of statistical routines. 

>From my meager understanding of RPy:

R vectors are converted to python lists. It may make
more sense to convert them to an array (either stdlib
or scipy.base version) - without copying data if
possible.

R arrays and matrices are converted to Numeric arrays.
Eg

In [8]: r.array([1,2,3,4,5,6],dim=[2,3])
Out[8]:
array([[1, 3, 5],
       [2, 4, 6]])

However, column and row names (or dimnames for arrays
with >2 dimensions) are lost in R->Py conversion. I do
not know whether these conversions require copying of
the data.

R data-frames are currently converted to python
dictionaries and I don?t think that there is any
simple way to convert a python object to an R data
frame. This is the biggest limitation of rpy in my
opinion. 

In [16]:
r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four'])
Out[16]: {'col2': ['one', 'two', 'three', 'four'],
'col1': [1, 2, 3, 4]}

If it were possible to convert between an R data-frame
and a scipy.base record array without copying or
losing data, RPy would become more useful.

I wish I understood C, scipy.base and R well enough to
give this a go. However, this is Way over my head! 

Mike 

--- Magnus Lie Hetland <magnus at hetland.org> wrote:
> I was just thinking about some experimental designs,
> and whether I
> could, perhaps, do the statistics in Python. I
> remembered having used
> RPy [1] briefly at some time (there may be other
> similar bindings out
> there -- I don't remember) and started thinking
> about whether I could,
> perhaps, combine it with numpy in some way. My first
> thought was to
> reimplement the relevant statistical functions; then
> I thought about
> how to convert data back and forth -- but then it
> occurred to me that
> R also uses arrays extensively, and that it could,
> perhaps, be
> possible to expose those (through something like
> RPy) through the
> array interface/protocol!
> 
> This would be (IMO) a good example of the benefits
> of the array
> protocol; it's not a matter of "getting yet another
> array module". RPy
> is an external library/language with *lots* of
> features that might be
> useful to numpy users, many of which aren't likely
> to be implemented
> in Python for quite a while, I'd guess (unless,
> perhaps, someone
> writes a translator from R, which I'm sure is
> doable).
> 
> I don't know enough (at least yet ;) about the
> implementation of RPy
> and the R library to say for sure whether this would
> even be possible,
> but it does seem like it could be really useful...
> 
> [1] rpy.sf.net
> 
> -- 
> Magnus Lie Hetland                    Fall seven
> times, stand up eight
> http://hetland.org                                 
> [Japanese proverb]
> 
> 
>
-------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT
> Products from real users.
> Discover which products truly live up to the hype.
> Start reading now.
>
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> 

Find local movie times and trailers on Yahoo! Movies.
http://au.movies.yahoo.com


From bsouthey at gmail.com  Wed Apr  6 11:38:37 2005
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed Apr  6 11:38:37 2005
Subject: [Numpy-discussion] Possible example application of the array interface
In-Reply-To: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
Message-ID: <bbcd77d00504061137318773ed@mail.gmail.com>

Hi,
I don't see that it is feasible to link R and numerical python in this
way. As you point out, R objects (R is an object orientated language)
uses a lot of meta-data.  Then there is the IEEE stuff (NaN etc) that
would also need to be handled in numerical python.

You probably could get RPy or RSPython to use numerical python rather
than just baisc Python.

What statistical functions would you want in numerical python? 

Regards
Bruce


On Apr 6, 2005 12:10 PM, Michael Sorich <mike_lists at yahoo.com.au> wrote:
> I think that this is a great idea! While I have a
> strong preference for python, I generally use R for
> statistical analyses due to the large number of mature
> libraries available. There are also some aspects of
> the R data types (eg data-frames and column/row names
> for 2D arrays) that are really nice for spreadsheet
> like data. I hope that scipy.base record arrays will
> be as easily manipulated as data-frames are.
> 
> While RPy works well for small simple problems, there
> are data conversion limitations between R and Python.
> If one could efficiently convert between the major R
> data types and python scipy.base data types without
> loss of data, it would become possible to do most of
> the data manipulation in python and freely mix in R
> functions when required. This may encourage the use of
> python for the development of statistical routines.
> 
> From my meager understanding of RPy:
> 
> R vectors are converted to python lists. It may make
> more sense to convert them to an array (either stdlib
> or scipy.base version) - without copying data if
> possible.
> 
> R arrays and matrices are converted to Numeric arrays.
> Eg
> 
> In [8]: r.array([1,2,3,4,5,6],dim=[2,3])
> Out[8]:
> array([[1, 3, 5],
>        [2, 4, 6]])
> 
> However, column and row names (or dimnames for arrays
> with >2 dimensions) are lost in R->Py conversion. I do
> not know whether these conversions require copying of
> the data.
> 
> R data-frames are currently converted to python
> dictionaries and I don't think that there is any
> simple way to convert a python object to an R data
> frame. This is the biggest limitation of rpy in my
> opinion.
> 
> In [16]:
> r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four'])
> Out[16]: {'col2': ['one', 'two', 'three', 'four'],
> 'col1': [1, 2, 3, 4]}
> 
> If it were possible to convert between an R data-frame
> and a scipy.base record array without copying or
> losing data, RPy would become more useful.
> 
> I wish I understood C, scipy.base and R well enough to
> give this a go. However, this is Way over my head!
> 
> Mike
> 
> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
> > I was just thinking about some experimental designs,
> > and whether I
> > could, perhaps, do the statistics in Python. I
> > remembered having used
> > RPy [1] briefly at some time (there may be other
> > similar bindings out
> > there -- I don't remember) and started thinking
> > about whether I could,
> > perhaps, combine it with numpy in some way. My first
> > thought was to
> > reimplement the relevant statistical functions; then
> > I thought about
> > how to convert data back and forth -- but then it
> > occurred to me that
> > R also uses arrays extensively, and that it could,
> > perhaps, be
> > possible to expose those (through something like
> > RPy) through the
> > array interface/protocol!
> >
> > This would be (IMO) a good example of the benefits
> > of the array
> > protocol; it's not a matter of "getting yet another
> > array module". RPy
> > is an external library/language with *lots* of
> > features that might be
> > useful to numpy users, many of which aren't likely
> > to be implemented
> > in Python for quite a while, I'd guess (unless,
> > perhaps, someone
> > writes a translator from R, which I'm sure is
> > doable).
> >
> > I don't know enough (at least yet ;) about the
> > implementation of RPy
> > and the R library to say for sure whether this would
> > even be possible,
> > but it does seem like it could be really useful...
> >
> > [1] rpy.sf.net
> >
> > --
> > Magnus Lie Hetland                    Fall seven
> > times, stand up eight
> > http://hetland.org
> > [Japanese proverb]
> >
> >
> >
> -------------------------------------------------------
> > SF email is sponsored by - The IT Product Guide
> > Read honest & candid reviews on hundreds of IT
> > Products from real users.
> > Discover which products truly live up to the hype.
> > Start reading now.
> >
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at lists.sourceforge.net
> >
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> >
> 
> Find local movie times and trailers on Yahoo! Movies.
> http://au.movies.yahoo.com
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>


From oliphant at ee.byu.edu  Wed Apr  6 12:28:50 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 12:28:50 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42537C6D.8040900@ims.u-tokyo.ac.jp>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> <42537C6D.8040900@ims.u-tokyo.ac.jp>
Message-ID: <425437E2.4090000@ee.byu.edu>

Michiel Jan Laurens de Hoon wrote:

> Travis Oliphant wrote:
>
>> Again,  scipy.base should *replace* Numerical Python for all users 
>
>
> Sorry, I give up. I have been very happy with Numerical Python so far 
> and the new Numerical Python just looks too much like SciPy to me. 
> It's even called scipy.base. In practical terms, what I've noticed is 
> that what used to work with Numerical Python no longer works with 
> Numeric3. For example:


It's apparent you have negative pre-conceptions about scipy (even though 
scipy has always just built on top of Numeric so I'm not sure what your 
difficulties have been).  This is unfortunate.  scipy.base is going to 
be a lot more like Numeric than scipy was.  So, I think you can relax.

>
> >>> from ndarray import *
> >>> argmax
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> NameError: name 'argmax' is not defined


This is only because the conversion hasn't completely taken place (I'm 
not importing the numeric.py module in __init__ yet because it hasn't 
been adjusted).     Remember ndarray is just a place-holder while 
development happens, so of course quite a few things aren't there yet.  
I've been swamped so far.    from ndarray import * won't even be the 
name to use.   The package won't be called ndarray.   This is all just 
for temporary development purposes.     All of what you belive should 
work will still continue to work.   So, relax.....


> >>>
>
> From what I understand from the discussion, "from Numeric import *" 
> will still work, but it will be deprecated, which means that I will 
> have to change my code at some point. Not to mention the other 
> packages (LinearAlgebra, RandomArray, etc.). It's just too much trouble.


Deprecated means new documentation won't teach that approach, that's 
pretty much it.  The approach will still be supported for quite a while 
so people can switch when and if they want.  I don't see "the trouble" 
at all.


> Anyway, I am about to change jobs (I will be moving to Columbia 
> University soon), so I have decided to take some time off the 
> Numerical Python project and see where we stand in a few months time. 
> Hopefully, the situation will have cleared up by then.


Sounds like an exciting move.   Perhaps I can meet you in person if I'm 
in New York or if you are every in Utah.    I sincerely hope you will 
find the new scipy.base to your liking.   I can promise you that your 
concerns are near the top of my list.    It's too bad you can't help us 
get there more quickly.  


-Travis


From oliphant at ee.byu.edu  Wed Apr  6 12:41:31 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 12:41:31 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
References: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
Message-ID: <42543B1B.3090209@ee.byu.edu>

S?bastien de Menten wrote:

>
> Hi Travis,
>
> Could you look at bug
> [ 635104 ] segfault unpickling Numeric 'O' array
> [ 567796 ] unpickling of 'O' arrays causes segfault   (duplicate of 
> previous one)
>
> I proposed a (rather simple) solution that I put in the comment of bug 
> [ 635104 ]. But apparently, nobody is looking at those bugs...


One thing I don't like about sourceforge bug tracker is that I don't get 
any email notification of bugs.  Is there an option for that?  I check 
my email, far more often than I check a website.  Sourceforge can be 
quite slow to manipulate around in.

Now, that you've mentioned it, I'll look into it.  I'm not sure that 
object arrays could every be pickled correctly.

-Travis

>
>>
>> I'd like to release a Numeric 24.0  to get the array interface out 
>> there.   There are also some other bug fixes in Numeric 24.0
>>
>> Here is the list so far from Numeric 23.7
>>
>> [Greenfield]  Changed so a[0,0] and a[0][0] returns same type when a 
>> is 2-d of Int16
>
>
> This is quite disturbing. In fact for all types that are not exactly 
> equivalent to python type, indexing a multidimensional array (rank > 
> 1) return arrays even if the final shape is ().

So, what should it do?    This is the crux of a long-standing wart in 
Numerical Python that nobody has had a good solution to (I think the 
array scalars that have been introduced for scipy.base are the best 
solution yet). 

Right now, the point is that different things are done for different 
indexing strategies.  Is this a good thing?   Maybe it is.  We can 
certainly leave it the way it is now and back-out the change.

The current behavior is:

Subscripting always produces a rank-0 array if the type doesn't match a 
basic Python type.
Item getting always produces a basic Python type (even if there is no 
match). 

So a[0,0] and a[0][0]  will return different things if a is an array of 
short's for example.  This may be what we live with and just call it a 
"feature"


> So
> type(zeros((5,2,4), Int8 )[0,0,0])  => <type 'array'>
> type(zeros((5,2,4), Int32 )[0,0,0])  => <type 'array'>
> type(zeros((5,2), Float32 )[0,0])  => <type 'array'>
> But
> type(zeros((5,2,4), Int )[0,0,0])  => <type 'int'>
> type(zeros((5,2,4), Float64)[0,0,0])  => <type 'float'>
> type(zeros((5,2,4), Float)[0,0,0])  => <type 'float'>
> type(zeros((5,2,4), PyObject)[0,0,0])  => <type 'int'>
>
> Notice too the weird difference betweeb Int <> Int32 and Float == 
> Float64.


This has been in Numeric for a long time (the coercion problems was one 
of the big reasons for it).   If you return a Python integer when 
indexing an Int8 array then use that for multiplication you get 
undesired up-casting.  There is no scalar Int8 type to return (thus a 
0-dimensional array that can act like a scalar is returned).   In 
scipy.base there are now scalar-like objects for all of the supported 
array types which is one solution to this problem that was made possible 
by the ability to inherit in C that is now part of Python.

What platform are you on?  Notice that Int is interpreted as C-long 
(PyArray_LONG)  while Int32 is PyArray_INT.     This has been another 
wart in Numerical Python.

By the way, I've fixed PyArray_Return so that if 
sizeof(long)==sizeof(int) then PyArray_INT also returns a Python 
integer.   I think for places where sizeof(long)==sizeof(int) 
PyArray_LONG and PyArray_INT should be treated identically.

>
> However, when indexing a onedimensional array (rank == 1), then we get 
> back scalar for indexing operations on all types.
>
> So, when you say "return the same type", do you think scalar or array 
> (it smells like a recent discussion on Numeric3 ...) ?


I just think the behavior ought to be the same for a[0,0] or a[0][0]  
but maybe I'm wrong and we should keep the dichotomy to satisfy both 
groups of people.    Because of the problems I alluded to, sometimes a 
0-dimensional array should be returned.

-Travis


From tchur at optushome.com.au  Wed Apr  6 14:00:52 2005
From: tchur at optushome.com.au (Tim Churches)
Date: Wed Apr  6 14:00:52 2005
Subject: [Numpy-discussion] Possible example application of the array
 interface
In-Reply-To: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
Message-ID: <42544D54.7040507@optushome.com.au>

Michael Sorich wrote:
> While RPy works well for small simple problems, there
> are data conversion limitations between R and Python.
> If one could efficiently convert between the major R
> data types and python scipy.base data types without
> loss of data, it would become possible to do most of
> the data manipulation in python and freely mix in R
> functions when required. This may encourage the use of
> python for the development of statistical routines. 

That's exactly what we do in our project (http://www.netepi.org) which
uses NumPy, RPy and R. The Python<->R interface provided by RPy has a
few wrinkles but overall is remarkably seemless and remarkably robust.

>>From my meager understanding of RPy:
> 
> R vectors are converted to python lists. It may make
> more sense to convert them to an array (either stdlib
> or scipy.base version) - without copying data if
> possible.

RPy directly converts (by copying) NumPy arrays to R arrays and vice
versa. C code is used to do this and it is quite fast. No Python lists
are involved. You do need to have NumPy installed (oncluding its header
files) when you compile RPy for this to work - otherwise RPy *does*
convert R arrays to Python lists.

> R arrays and matrices are converted to Numeric arrays.
> Eg
> 
> In [8]: r.array([1,2,3,4,5,6],dim=[2,3])
> Out[8]:
> array([[1, 3, 5],
>        [2, 4, 6]])
> 
> However, column and row names (or dimnames for arrays
> with >2 dimensions) are lost in R->Py conversion. I do
> not know whether these conversions require copying of
> the data.
> 
> R data-frames are currently converted to python
> dictionaries and I don?t think that there is any
> simple way to convert a python object to an R data
> frame. This is the biggest limitation of rpy in my
> opinion. 
> 
> In [16]:
> r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four'])
> Out[16]: {'col2': ['one', 'two', 'three', 'four'],
> 'col1': [1, 2, 3, 4]}
> 
> If it were possible to convert between an R data-frame
> and a scipy.base record array without copying or
> losing data, RPy would become more useful.
> 
> I wish I understood C, scipy.base and R well enough to
> give this a go. However, this is Way over my head! 

You can extend the conversion routines of RPy (in either direction)
using a very simple interface, using just Python and R. No knowledge of
C is necessary. For example, if you want to convert an R data.frame into
a custom class which you have written in Python, it is quite easy to add
that to Rpy. There is an example for doing this with data.frames given
in the Rpy documentation.

(More comments below).

> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
> 
>>I was just thinking about some experimental designs,
>>and whether I
>>could, perhaps, do the statistics in Python. I
>>remembered having used
>>RPy [1] briefly at some time (there may be other
>>similar bindings out
>>there -- I don't remember)

There is also RSPython, which allows Python to be called from R as well
as R to be called from Python. However, it is far more experimental than
RPy, and much harder to build and rather less robust, but more ambitious
in its scope. RPy only allows calling of R functions (almost everything
is done via functions in R) from Python, although as noted above it has
good facilities for converting R objects back into Python objects, and
also allows R objects to be returned to Python as native, unconverted R
objects - so you can store native R objects in a Python list or
dictionary if you wish. You can't see inside those native R objects with
Python, but you can use them as arguments to R functions called via RPy.
However, the default action in RPy is to do its best to convert R
objects into Python data structures when R functions called via RPy
return. That conversion is easily customisable as noted above.

>> and started thinking
>>about whether I could,
>>perhaps, combine it with numpy in some way. My first
>>thought was to
>>reimplement the relevant statistical functions; then
>>I thought about
>>how to convert data back and forth -- but then it
>>occurred to me that
>>R also uses arrays extensively, and that it could,
>>perhaps, be
>>possible to expose those (through something like
>>RPy) through the
>>array interface/protocol!

It seems that the new NumPy array interface could indeed be used to
allow Python and R to share the same array data, rather than making
copies as happens at present (albeit very quickly).

>>This would be (IMO) a good example of the benefits
>>of the array
>>protocol; it's not a matter of "getting yet another
>>array module". RPy
>>is an external library/language with *lots* of
>>features that might be
>>useful to numpy users, many of which aren't likely
>>to be implemented
>>in Python for quite a while, I'd guess (unless,
>>perhaps, someone
>>writes a translator from R, which I'm sure is
>>doable).

R is a massive project with a huge library of statistical routines - it
is several times larger in its extent than Python (that's a weakness as
well as a strength, as R tends to be sprawling and rather intimidating
in its size). R also has a very large community of top computational
statisticians behind it. Better to work with R than to try to compete
with it. That said, there is no reason not to port R libraries or
specific R functions to NumPy where that provides performance gains, or
where the data are large and already handled in NumPy. Our approach in
NetEpi (http://www.netepi.org) is to do the data selection and reduction
(usually summarisation) in NumPy (where we store data on disc as
memory-mapped NumPy arrays) and then pass the much smaller summarised
results to R for plotting or fitting complex statistical models.
However, we do calculation of elementary statistics (means, quantiles
and other measures of location, variance etc) in NumPy wherever possible
to avoid copying large amounts of data to R via RPy.

>>I don't know enough (at least yet ;) about the
>>implementation of RPy
>>and the R library to say for sure whether this would
>>even be possible,
>>but it does seem like it could be really useful...
>>
>>[1] rpy.sf.net

I have copied this message to the RPy list - hopefully some fruitful
discussion can ensue.

Tim C


From gregory.r.warnes at pfizer.com  Wed Apr  6 14:02:05 2005
From: gregory.r.warnes at pfizer.com (Warnes, Gregory R)
Date: Wed Apr  6 14:02:05 2005
Subject: [Rpy] [Fwd: Re: [Numpy-discussion] Possible example applicati
	on of the array interface]
Message-ID: <915D2D65A9986440A277AC5C98AA466F978DC2@groamrexm02.amer.pfizer.com>

Hi All,

It is possible to establish conversion functions so that R dataframe, lists,
and vector objects are better translated into python equivalents.  I've made
several aborted stabs at this, but my time has been extremely limited.

The basic task is to create a functionally equivalent python class [The
tricky bit here is that R list and vector objects have both order and names.
It is possible to emulate this in python by creating a base object that
maintains a dictionary of names in along side the data vector/matrix data.]

See the example in the rpu documentation at
http://rpy.sourceforge.net/rpy/doc/manual_html/DataFrame-class.html#DataFram
e%20class.

This shouldn't be very hard if someone can dedicate a bit of time to it.

-Greg
(Current RPy maintainer)


> -----Original Message-----
> From: rpy-list-admin at lists.sourceforge.net
> [mailto:rpy-list-admin at lists.sourceforge.net]On Behalf Of Tim Churches
> Sent: Wednesday, April 06, 2005 4:22 PM
> To: rpy-list at lists.sourceforge.net
> Subject: [Rpy] [Fwd: Re: [Numpy-discussion] Possible example 
> application
> of the array interface]
> 
> 
> The following discussion occured on the Numeric Python mailing list.
> Others may wish to enjoin the conversation.
> 
> Tim C
> 
> -------- Original Message --------
> Subject: Re: [Numpy-discussion] Possible example application of the
> array interface
> Date: Thu, 7 Apr 2005 03:10:08 +1000 (EST)
> From: Michael Sorich <mike_lists at yahoo.com.au>
> To: numpy-discussion at lists.sourceforge.net
> 
> I think that this is a great idea! While I have a
> strong preference for python, I generally use R for
> statistical analyses due to the large number of mature
> libraries available. There are also some aspects of
> the R data types (eg data-frames and column/row names
> for 2D arrays) that are really nice for spreadsheet
> like data. I hope that scipy.base record arrays will
> be as easily manipulated as data-frames are.
> 
> While RPy works well for small simple problems, there
> are data conversion limitations between R and Python.
> If one could efficiently convert between the major R
> data types and python scipy.base data types without
> loss of data, it would become possible to do most of
> the data manipulation in python and freely mix in R
> functions when required. This may encourage the use of
> python for the development of statistical routines.
> 
> >From my meager understanding of RPy:
> 
> R vectors are converted to python lists. It may make
> more sense to convert them to an array (either stdlib
> or scipy.base version) - without copying data if
> possible.
> 
> R arrays and matrices are converted to Numeric arrays.
> Eg
> 
> In [8]: r.array([1,2,3,4,5,6],dim=[2,3])
> Out[8]:
> array([[1, 3, 5],
>        [2, 4, 6]])
> 
> However, column and row names (or dimnames for arrays
> with >2 dimensions) are lost in R->Py conversion. I do
> not know whether these conversions require copying of
> the data.
> 
> R data-frames are currently converted to python
> dictionaries and I don?t think that there is any
> simple way to convert a python object to an R data
> frame. This is the biggest limitation of rpy in my
> opinion.
> 
> In [16]:
> r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four'])
> Out[16]: {'col2': ['one', 'two', 'three', 'four'],
> 'col1': [1, 2, 3, 4]}
> 
> If it were possible to convert between an R data-frame
> and a scipy.base record array without copying or
> losing data, RPy would become more useful.
> 
> I wish I understood C, scipy.base and R well enough to
> give this a go. However, this is Way over my head!
> 
> Mike
> 
> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
> > I was just thinking about some experimental designs,
> > and whether I
> > could, perhaps, do the statistics in Python. I
> > remembered having used
> > RPy [1] briefly at some time (there may be other
> > similar bindings out
> > there -- I don't remember) and started thinking
> > about whether I could,
> > perhaps, combine it with numpy in some way. My first
> > thought was to
> > reimplement the relevant statistical functions; then
> > I thought about
> > how to convert data back and forth -- but then it
> > occurred to me that
> > R also uses arrays extensively, and that it could,
> > perhaps, be
> > possible to expose those (through something like
> > RPy) through the
> > array interface/protocol!
> > 
> > This would be (IMO) a good example of the benefits
> > of the array
> > protocol; it's not a matter of "getting yet another
> > array module". RPy
> > is an external library/language with *lots* of
> > features that might be
> > useful to numpy users, many of which aren't likely
> > to be implemented
> > in Python for quite a while, I'd guess (unless,
> > perhaps, someone
> > writes a translator from R, which I'm sure is
> > doable).
> > 
> > I don't know enough (at least yet ;) about the
> > implementation of RPy
> > and the R library to say for sure whether this would
> > even be possible,
> > but it does seem like it could be really useful...
> > 
> > [1] rpy.sf.net
> > 
> > -- 
> > Magnus Lie Hetland                    Fall seven
> > times, stand up eight
> > http://hetland.org                                 
> > [Japanese proverb]
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from 
> real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
> _______________________________________________
> rpy-list mailing list
> rpy-list at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rpy-list
> 
> 


LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.


From cookedm at physics.mcmaster.ca  Wed Apr  6 14:04:36 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 14:04:36 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42543B1B.3090209@ee.byu.edu> (Travis Oliphant's message of
 "Wed, 06 Apr 2005 13:40:11 -0600")
References: <BAY103-F4255AFFE1DACBFBFC1E35BA43D0@phx.gbl>
	<42543B1B.3090209@ee.byu.edu>
Message-ID: <qnkfyy3k48v.fsf@arbutus.physics.mcmaster.ca>

Travis Oliphant <oliphant at ee.byu.edu> writes:

> S?bastien de Menten wrote:
>
>>
>> Hi Travis,
>>
>> Could you look at bug
>> [ 635104 ] segfault unpickling Numeric 'O' array
>> [ 567796 ] unpickling of 'O' arrays causes segfault   (duplicate of
>> previous one)
>>
>> I proposed a (rather simple) solution that I put in the comment of
>> bug [ 635104 ]. But apparently, nobody is looking at those bugs...
>
>
> One thing I don't like about sourceforge bug tracker is that I don't
> get any email notification of bugs.  Is there an option for that?  I
> check my email, far more often than I check a website.  Sourceforge
> can be quite slow to manipulate around in.

I think if the bug is assigned to you, you get email.
>
>> So
>> type(zeros((5,2,4), Int8 )[0,0,0])  => <type 'array'>
>> type(zeros((5,2,4), Int32 )[0,0,0])  => <type 'array'>
>> type(zeros((5,2), Float32 )[0,0])  => <type 'array'>
>> But
>> type(zeros((5,2,4), Int )[0,0,0])  => <type 'int'>
>> type(zeros((5,2,4), Float64)[0,0,0])  => <type 'float'>
>> type(zeros((5,2,4), Float)[0,0,0])  => <type 'float'>
>> type(zeros((5,2,4), PyObject)[0,0,0])  => <type 'int'>
>>
>> Notice too the weird difference betweeb Int <> Int32 and Float ==
>> Float64.
>
> By the way, I've fixed PyArray_Return so that if
> sizeof(long)==sizeof(int) then PyArray_INT also returns a Python
> integer.   I think for places where sizeof(long)==sizeof(int)
> PyArray_LONG and PyArray_INT should be treated identically.

I don't think this is good -- it's just papering over the problem. It
leads to different behaviour on machines where sizeof(long) !=
sizeof(int) (specifically, the problem reported by Nils Wagner *won't*
be fixed by this on my machine). On some machines x[0] will give you a
int (where x is an array of Int32), on others an array: not fun.

I see you already beat me in changing PyArray_PyIntAsInt to support
rank-0 integer arrays. How about changing that to instead using
anything that int() can handle (using PyNumber_AsInt)? This would
include anything int-like (rank-0 integer arrays, scipy.base array
scalars, etc.).

The side-effect is that you can index using floats (since int() of a
float truncates it towards 0). If this is a big deal, I can
special-case floats to raise an error.

This would make (almost) all Numeric behaviour consistent with regards
to using Python ints, Python longs, and rank-0 integer arrays, and
other int-like objects.

>> However, when indexing a onedimensional array (rank == 1), then we
>> get back scalar for indexing operations on all types.
>>
>> So, when you say "return the same type", do you think scalar or
>> array (it smells like a recent discussion on Numeric3 ...) ?
>
> I just think the behavior ought to be the same for a[0,0] or a[0][0]
> but maybe I'm wrong and we should keep the dichotomy to satisfy both
> groups of people.    Because of the problems I alluded to, sometimes a
> 0-dimensional array should be returned.

I'd prefer having a[0,0] and a[0][0] return the same thing: it's not
the special case of how to do two indices: it's the special-casing of
rank-1 arrays as compared to rank-n arrays.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From cookedm at physics.mcmaster.ca  Wed Apr  6 14:42:38 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 14:42:38 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric
Message-ID: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>

I've always found the Numeric setup.py to be not very user-friendly.
So, I rewrote it. It's available as patch #1178095
http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369

Basically, all the editing you need to do is in customize.py, instead
of touching setup.py. No more commenting out files for lapack_lite
(just tell it to use the system LAPACK, and tell it where to find it).

Also, you could now use GSL's cblas interface for dotblas. Useful if
you've already taken the trouble to link that with an optimized
Fortran BLAS.

I didn't want to just through this into CVS without feedback first :-)
If it looks good, this can go in Numeric 24.0.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From perry at stsci.edu  Wed Apr  6 15:05:47 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Wed Apr  6 15:05:47 2005
Subject: [Numpy-discussion] Re: Array Metadata
In-Reply-To: <200504011146.44549.faltet@carabos.com>
References: <20050401041204.18335.qmail@web50208.mail.yahoo.com> <200504011146.44549.faltet@carabos.com>
Message-ID: <00c3ccc871b2107c78efa7cb3758fe8c@stsci.edu>

Coming in very late...

On Apr 1, 2005, at 4:46 AM, Francesc Altet wrote:

> I'm very much with the opinions of Scott. Just some remarks.
>
> A Divendres 01 Abril 2005 06:12, Scott Gilbert va escriure:

>>> I also think that rather than attach < or > to the start of the
>>> string it would be easier to have another protocol for endianness.
>>> Perhaps something like:
>>>
>>> __array_endian__  (optional Python integer with the value 1 in it).
>>> If it is not 1, then a byteswap must be necessary.
>>
>> A limitation of this approach is that it can't adequately represent
>> struct/record arrays where some fields are big endian and others are 
>> little
>> endian.
>
> Having a mix of different endianess data values in the same data
> record would be a bit ill-minded. In fact, numarray does not support
> this: a recarray should be all little or big endian. I think that '<'
> and '>' would be more than enough to represent this.
>
Nothing intrinsically prevents numarray from allowing this for records, 
but I'd agree that I have a hard time understanding when a given record 
array would have mixed endianess.

>>> So, what if we proposed for the Python core not something like
>>> Numeric3 (which would still exist in scipy.base and be everybody's
>>> favorite array :-) ), but a very minimal array object (scaled back
>>> even from Numeric) that followed the array protocol and had some
>>> C-API associated with it.
>>>
>>> This minimal array object would support 5 basic types ('bool',
>>> 'integer', 'float', 'complex', 'Object').   (Maybe a void type
>>> could be defined and a void "scalar" introduced (which would be
>>> the bytes object)).  These types correspond to scalars already
>>> available in Python and so the whole 0-dim array Python scalar
>>> arguments could be ignored.
>>
>> I really like this idea.  It could easily be implemented in C or 
>> Python
>> script.  Since half it's purpose is for documentation, the Python 
>> script
>> implementation might make more sense.
>
> Yeah, I fully agree with this also.
>
>
I'm not against it, but I wonder if it is the most important thing to 
do next. I can imagine that there are many other issues that deserve 
more attention than this. But I won't tell Travis what to do, 
obviously. Likewise about working on the current Python array module.

Perry

Perry


From perry at stsci.edu  Wed Apr  6 15:09:11 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Wed Apr  6 15:09:11 2005
Subject: [Numpy-discussion] Questions about ufuncs now.
In-Reply-To: <4253028D.4090407@ee.byu.edu>
References: <4253028D.4090407@ee.byu.edu>
Message-ID: <0d2b3dd0b5f97750022b47de6f1fad33@stsci.edu>

On Apr 5, 2005, at 5:26 PM, Travis Oliphant wrote:

>
> The arrayobject for scipy.base seems to be working.  Currently the 
> Numeric3 CVS tree is using the "old-style" ufuncs modified with new 
> code for the newly added types.     It should be quite functionable 
> now for the brave at heart.
>
> I'm now working on modifying the ufunc object for scipy.base.
>
> These are the changes I'm working on:
>
>   1) a thread-specific? context that allows "buffer-size" level 
> trapping
>   of errors and retrieving of flags set.  Similar to the
>   decimal.context specification, but it uses the floating point
>   sticky bits to implement.
>
>   2) implementation of buffers so that type-conversions (and
>   byteswapping and alignment if necessary) never creates temporaries
>   larger than the buffer-size (the buffer-size is user settable).
>
>   3) a reworking of the general N-dimensional loop to use array 
> iterators with optimizations
>   applied for contiguous arrays.
>
>   4) Alteration of coercion rules so that scalars (i.e. rank-0 arrays) 
> do not dictate coercion rules
>   Also, change so that certain mixed-type operations are computed in 
> larger type for both.
>
> Most of this is pretty straightforward.  But, I do have one addiitonal 
> question.  Do the new array scalars count as "non-coercing" scalars 
> (i.e. like the Python scalars), or do they cause coercion?
>
> My preference is that  ALL scalars (anything that becomes 
> 0-dimensional arrays internally) cause only "kind-casting" (i.e. int 
> to float, float to complex, etc.) but not "type-casting"
>
Seems reasonable. One could argue that since they have their own 
precision that normal coercion rules should apply, but so long as 
Python scalar literals don't, having different coercion rules for  what 
look like scalars taken from arrays than for python scalars is bound to 
lead to great confusion. So I agree.

Perry


From perry at stsci.edu  Wed Apr  6 15:09:51 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Wed Apr  6 15:09:51 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42537690.5040400@colorado.edu>
References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> <42537690.5040400@colorado.edu>
Message-ID: <7779a4425dd6f32659e9c5f15b48e180@stsci.edu>

I'll echo  Fernando's comments.
On Apr 6, 2005, at 1:41 AM, Fernando Perez wrote:

> Travis Oliphant wrote:
>> Michiel Jan Laurens de Hoon wrote:
>
>>> But SciPy has been moving away (e.g. by replacing functions by 
>>> methods).
>> Michiel, you seem to want to create this impression that "SciPy" is 
>> "moving away."  I'm not sure of your motivations.   But, since this 
>> is a public forum, I have to restate emphatically, that "SciPy" is 
>> not "moving away from Numeric."  It is all about bringing together 
>> the communities.  For the 5 years that scipy has been in development, 
>> it has always been about establishing a library of common routines 
>> that we could all share.   It has built on Numeric from the 
>> beginning.  Now, there is another "library" of routines that is 
>> developing around numarray.  It is this very real break that I'm 
>> trying to help fix.   I have no other "desire" to "move away" or 
>> "create a break"  or any other such notions that you seem to want to 
>> spread.
>
> FWIW, I think you (Travis) have been exceedingly clear in explaining 
> this process, and in pointing out how this is:
>
> a) NOT a further split, but rather the EXACT OPPOSITE (numarray users 
> will have a transition path back into a project which will provide the 
> best of the old Numeric, along with all the critical enhancements 
> which Perry, Todd et al. added to numarray).
>
> b) a way, via the array protocol, to provide third-party low-level 
> libraries an easy way to, AT THE C LEVEL, interact easily and 
> efficiently (without unnecessary copies) with numeri* arrays.
>
>
[...]


From Chris.Barker at noaa.gov  Wed Apr  6 15:37:05 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 15:37:05 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <4253C73E.4030703@ims.u-tokyo.ac.jp>
References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> <4253C73E.4030703@ims.u-tokyo.ac.jp>
Message-ID: <42546439.5060301@noaa.gov>


Michiel Jan Laurens de Hoon wrote:

> Also, make sure you have Cygwin installed, with all the necessary packages.

MinGw is NOT Cygwin. You need to have MinGw installed, with all the 
necessary packages. I don't remember which ones, but I think there is 
not a single large package that gives you the whole pile. I do remember 
it being pretty easy for me last time I did it.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From cookedm at physics.mcmaster.ca  Wed Apr  6 15:44:36 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 15:44:36 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> (konrad hinsen's
 message of "Wed, 6 Apr 2005 16:48:30 +0200")
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
	<8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net>
Message-ID: <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca>

konrad.hinsen at laposte.net writes:

> On Apr 6, 2005, at 12:10, S?bastien de Menten wrote:
>
>> However, I disagree with the "pretty straightforward to
>> implement". In fact, if one wants to inherit most of the
>> functionalities of Numeric, it becomes quite cumbersome. Looking at
>> MA module, I see that it needs to:
>
> It is straightforward AND cumbersome. Lots of work, but nothing
> difficult. I agree of course that it would be nice to improve the
> situation.
>
>> An embryo of idea would be to add hooks in the machinery to allow an
>> object to interact with an ufunc. Currently, this is done by calling
>> __array__ to extract a "naked array" (== Numeric.array vs
>> "augmented array") but the result is then always a "naked
>> array".
>> In pseudocode, this looks like:
>>
>>  def ufunc( augmented_array ):
>>    if not isarray(augmented_array):
>>      augmented_array = augmented_array.__array__()
>>    return ufunc.apply(augmented_array)
>
> The current behaviour of Numeric is more like
>
> 	def ufunc(object):
> 		if isarray(object):
> 			return array_ufunc(object)
> 		elif is_array_like(object):
> 			return array_func(array(object))
> 		else:
> 			return object.ufunc()
>
> A more general version, which should cover your case as well, would be:
>
> 	def ufunc(object):
> 		if isarray(object):
> 			return array_ufunc(object)
> 		else:
> 			try:
> 				return object.applyUfunc(ufunc)
> 			except AttributeError:
> 				if is_array_like(object):
> 					return array_func(array(object))
> 				else:
> 					raise ValueError
>
> There are two advantages:
>
> 1) Classes can handle ufuncs in any way they like, even if they
> implement
>     array-like objects.
> 2) Classes must implement only one method, not one per ufunc.

I like this! It's got namespace goodness all over it (last Python zen
line in 'import this': Namespaces are one honking great idea -- let's
do more of those!)

I'd propose making the special method __ufunc__.

> Compared to the approach that you suggested:
>
>> where I would prefer something like
>>
>>  def ufunc( augmented_array ):
>>    if not isarray(augmented_array):
>>      augmented_array, contructor =
>> augmented_array.__array_constructor__()
>>    else:
>>      constructor = lambda x:x
>>    return constructor(ufunc.apply(augmented_array))
>
> mine has the advantage of also covering classes that are not
> array-like at all.

... like your derivative classes, which are very useful. There are two
different uses that ufuncs apply to, however.

1) arrays. Here, we want efficient computation of functions applied to
   lots of elements. That's where the output arguments and special
   methods (.reduce, .accumulate, and .outer) are useful
2) polymorphic functions. Output arguments aren't useful here. The
   special methods are useful for binary ufuncs only.

For #2, just returning a callable from __ufunc__ would be fine. I'd
suggest two levels of an informal ufunc interface corresponding to
these two uses.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From Chris.Barker at noaa.gov  Wed Apr  6 15:49:44 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 15:49:44 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <42546709.1050600@noaa.gov>


David M. Cooke wrote:
> I've always found the Numeric setup.py to be not very user-friendly.
> So, I rewrote it. It's available as patch #1178095
> http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369

 From that file:

# If use_system_lapack is false, f2c'd versions of the required routines
# will be used, except on Mac OS X, where the vecLib framework will be used
# if found.

Just to be clear, this does mean that vecLib will be used by default on 
OS-X?

Very nice, setup.py has annoyed me too.

-Chris
-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From Chris.Barker at noaa.gov  Wed Apr  6 15:51:17 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 15:51:17 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <42546766.5060802@noaa.gov>

Hi all, (but mostly Travis),

I've taken a look at:

http://numeric.scipy.org/array_interface.html)

to try and see how I would use this with wxPython. I have a few 
questions, and a little code I'd like you to look at to see if I 
understand how this works.

Here's a first stab on how I might use this for the wxPython 
DrawPointsList method. The method takes a sequence of length-2 sequences 
of numbers, and draws a point at each point described by coordinates in 
the data:

[(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints)

Here's what I have:

     def DrawPointList(self, points, pens=None):
	...
	# some checking code on the pens)
         ...
         if (hasattr(points,'__array_shape__') and
                 hasattr(points,'__array_typestr__') and
                 len(points.__array_shape__) == 2 and
                 points.__array_shape__[1] == 2 and
                 points.__array_typestr__ == 'i4' and
                 ): # this means we have a compliant array
            # return the array protocol version
            return self._DrawPointArray(points.__array_data__, pens,[])
                    #This needs to be written now!
         else:
             #return the generic python sequence version
             return self._DrawPointList(points, pens, [])

Then we'll need a function (in C++):
  _DrawPointArray(points.__array_data__, pens,[])
That takes a buffer object, and does the drawing.

My questions:

1) Is this what you had in mind for how to use this?

2) As __array_strides__ is optional, I'd kind of like to have a 
__contiguous__ flag that I could just check, rather than checking for 
the existence of strides, then calculating what the strides should be, 
then checking them.

3) A number of the attributes are optional, but will always be there 
with SciPy arrays..(I assume) have you documented them anywhere?

4) a wxWidgets wxPoint is defined as such:

class WXDLLEXPORT wxPoint
{
public:
     int x, y;

etc.

As wxWidgets is using "int", I"d like to be able to use "int". If I 
define it as a 4 byte integer, I'm losing platform independence, aren't 
I? Or can I use something like sizeof(int) ?

5) Why is: __array_data__ optional? Isn't that the whole point of this?

6) Should __array_offset__ be optional? I'd rather it were required, but 
  default to zero. This way I have to check for it, then use it. Also, I 
assume it is an integer number of bytes, is that right?

7) An alternative to the above: A __simple_ flag, that means the data is 
a simple, C array of contiguous data of a single type. The most common 
use, and it would be nice to just check that flag and not have to take 
all other options into account.

Thanks,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From efiring at hawaii.edu  Wed Apr  6 15:53:05 2005
From: efiring at hawaii.edu (Eric Firing)
Date: Wed Apr  6 15:53:05 2005
Subject: [Numpy-discussion] masked arrays and NaNs
Message-ID: <425467BB.305@hawaii.edu>

Travis,

I am whole-heartedly in favor of your efforts to end the 
Numeric/numarray split by combining the best of both. I am encouraged by 
the progress you have made, and by the depth and clarity of the 
accompanying technical discussions.  Thank you!

I am a long-time Matlab user in Physical Oceanography, and I have been 
trying to find a practical way to phase out Matlab.  One key is 
matplotlib, which is coming along wonderfully.  A second is the 
availability of a Num* (or scipy.base) module that provides the 
functionality and ease-of-use I presently get from Matlab.  This leads 
to a request which I suspect and hope is consistent with your present 
plans: efficient handling of NaNs and/or masked arrays.

In Physical Oceanography, and I suspect in many other fields, data sets 
are almost always full of holes.  Matlab's ability to use NaN as a bad 
value flag provides a wonderfully simple and efficient way of dealing 
with missing or bad data values.  A similar ease and transparency would 
be good in scipy.base.  In addition, or as a way of implementing 
NaN-handling internally, it might be best to have masked arrays 
incorporated at the C level--with the functionality available by 
default--rather than bolted on as a pure-python package.  I hope that 
inclusion of __array_mask__ in the protocol means that this is part of 
the plan.

Eric


From Chris.Barker at noaa.gov  Wed Apr  6 16:00:09 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 16:00:09 2005
Subject: [Numpy-discussion] Numeric3 - a Windows Problem
In-Reply-To: <42546439.5060301@noaa.gov>
References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> <4253C73E.4030703@ims.u-tokyo.ac.jp> <42546439.5060301@noaa.gov>
Message-ID: <425469AA.2030703@noaa.gov>


Chris Barker wrote:
> there is not a single large package 

OOPS. There IS a single large package.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From oliphant at ee.byu.edu  Wed Apr  6 16:13:08 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 16:13:08 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <qnkvf6zilv0.fsf@arbutus.physics.mcmaster.ca>
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>	<425458F7.9020307@ee.byu.edu> <qnkvf6zilv0.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <42546CC7.40408@ee.byu.edu>

David M. Cooke wrote:

>Travis Oliphant <oliphant at ee.byu.edu> writes:
>
>  
>
>>David M. Cooke wrote:
>>
>>    
>>
>>>I've always found the Numeric setup.py to be not very user-friendly.
>>>So, I rewrote it. It's available as patch #1178095
>>>http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369
>>>
>>>Basically, all the editing you need to do is in customize.py, instead
>>>of touching setup.py. No more commenting out files for lapack_lite
>>>(just tell it to use the system LAPACK, and tell it where to find it).
>>>
>>>Also, you could now use GSL's cblas interface for dotblas. Useful if
>>>you've already taken the trouble to link that with an optimized
>>>Fortran BLAS.
>>>
>>>I didn't want to just through this into CVS without feedback first :-)
>>>If it looks good, this can go in Numeric 24.0.
>>>
>>>      
>>>
>>I like the new changes.  I also think the setup.py file is unfriendly.
>>Put them in...
>>    
>>
>
>While I'm at it, I'm also thinking of writing a 'cblas_lite' for
>dotblas. This would mean that dotblas would be enabled all the time.
>You could use a C BLAS if you've got one (from ATLAS, say), or a
>Fortran BLAS (like the cxml library on an Alpha running Tru64), or it
>would use the existing blas_lite.c if you don't.
>
>  
>
This is a good idea, but for more than just dotblas. 

It is the essential problem that must be solved to make scipy.base 
installable everywhere yet use fast libraries for users who have them 
without much fuss.

-Travis


From rkern at ucsd.edu  Wed Apr  6 16:28:40 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Wed Apr  6 16:28:40 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <42546709.1050600@noaa.gov>
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca> <42546709.1050600@noaa.gov>
Message-ID: <42547060.30204@ucsd.edu>

Chris Barker wrote:
> 
> 
> David M. Cooke wrote:
> 
>> I've always found the Numeric setup.py to be not very user-friendly.
>> So, I rewrote it. It's available as patch #1178095
>> http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 
>>
> 
> 
>  From that file:
> 
> # If use_system_lapack is false, f2c'd versions of the required routines
> # will be used, except on Mac OS X, where the vecLib framework will be used
> # if found.
> 
> Just to be clear, this does mean that vecLib will be used by default on 
> OS-X?

I haven't tried it, yet, but my examination of it suggests that this is so.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From oliphant at ee.byu.edu  Wed Apr  6 16:59:05 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 16:59:05 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <42546766.5060802@noaa.gov>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca> <42546766.5060802@noaa.gov>
Message-ID: <4254778A.1070100@ee.byu.edu>

Chris Barker wrote:

> Hi all, (but mostly Travis),
>
> I've taken a look at:
>
> http://numeric.scipy.org/array_interface.html)
>
> to try and see how I would use this with wxPython. I have a few 
> questions, and a little code I'd like you to look at to see if I 
> understand how this works.


Great, fantastic!!!

>
> Here's a first stab on how I might use this for the wxPython 
> DrawPointsList method. The method takes a sequence of length-2 
> sequences of numbers, and draws a point at each point described by 
> coordinates in the data:
>
> [(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints)
>
> Here's what I have:
>
>     def DrawPointList(self, points, pens=None):
>     ...
>     # some checking code on the pens)
>         ...
>         if (hasattr(points,'__array_shape__') and
>                 hasattr(points,'__array_typestr__') and
>                 len(points.__array_shape__) == 2 and
>                 points.__array_shape__[1] == 2 and
>                 points.__array_typestr__ == 'i4' and
>                 ): # this means we have a compliant array
>            # return the array protocol version


You should account for the '<' or '>' that might be present in 
__array_typestr__   (Numeric won't put it there, but scipy.base and 
numarray will---since they can have byteswapped arrays internally).  

A more generic interface would handle multiple integer types if possible 
(but this is a good start...)


>            return self._DrawPointArray(points.__array_data__, pens,[])
>                    #This needs to be written now!
>         else:
>             #return the generic python sequence version
>             return self._DrawPointList(points, pens, [])
>
> Then we'll need a function (in C++):
>  _DrawPointArray(points.__array_data__, pens,[])
> That takes a buffer object, and does the drawing.
>
> My questions:
>
> 1) Is this what you had in mind for how to use this?


Yes, pretty much.

>
> 2) As __array_strides__ is optional, I'd kind of like to have a 
> __contiguous__ flag that I could just check, rather than checking for 
> the existence of strides, then calculating what the strides should be, 
> then checking them.


I don't want to add too much.  The other approach is to establish a set 
of helper functions in Python to check this sort of thing:   Thus, if 
you can't handle a general array you check:

ndarray.iscontiguous(obj) 

where obj exports the array interface.

But, it could really go either way.   What do others think?

I think one idea here is that if __array_strides__ returns None, then 
C-style contiguousness is assumed.   In fact, I like that idea so much 
that I just changed the interface.  Thanks for the suggestion.

>
> 3) A number of the attributes are optional, but will always be there 
> with SciPy arrays..(I assume) have you documented them anywhere?


No, they won't always be there for SciPy arrays (currently 4 of them 
are).  Only record-arrays will provide __array_descr__ for example and 
__array_offset__ is unnecessary for SciPy arrays.  I actually don't much 
like the __array_offset__  parameter myself, but Scott convinced me that 
it would could be useful for very complicated array classes. 

>
> 4) a wxWidgets wxPoint is defined as such:
>
> class WXDLLEXPORT wxPoint
> {
> public:
>     int x, y;
>
> etc.
>
> As wxWidgets is using "int", I"d like to be able to use "int". If I 
> define it as a 4 byte integer, I'm losing platform independence, 
> aren't I? Or can I use something like sizeof(int) ?


Ah, yes.. here is where we need some standard Python functions to help 
establish the array interface.   Sometimes you want to match a 
particular c-type, other times you want to match a particular bit 
width.  So, what do you do?  I had considered having an additional 
interface called ctypestr but decided against it for fear of creep.   I 
think in general we need to have in Python some constants to make this 
conversion easy

e.g.  ndarray.cint  (gives 'iX' on the correct platform). 

For now, I would check (__array_typestr__ == 'i%d' % 
array.array('i',[0]).itemsize)

But, on most platforms these days an int is 4 bytes, but the about would 
be just to make sure.

>
> 5) Why is: __array_data__ optional? Isn't that the whole point of this?

Because the object itself might expose the buffer interface.  We could 
make __array_data__ required and prefer that it return a buffer object.  
But, really all that is needed is something that exposes the buffer 
interface:  remember the difference between the buffer object and the 
buffer interface.   So, the correct consumer usage for grabbing the data is

data = getattr(obj, '__array_data__', obj)

Then, in C you use the Buffer *Protocol* to get a pointer to memory.  
For example, the function:

int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int 
*buffer_len)

Of course this approach has the 32-bit limit until we get this changed 
in Python. 

>
> 6) Should __array_offset__ be optional? I'd rather it were required, 
> but  default to zero. This way I have to check for it, then use it. 
> Also, I assume it is an integer number of bytes, is that right?


A consumer has to check for most of the optional stuff if they want to 
support all types of arrays.

Again a simple:

getattr(obj, '__array_offset__', 0)

works fine.

>
> 7) An alternative to the above: A __simple_ flag, that means the data 
> is a simple, C array of contiguous data of a single type. The most 
> common use, and it would be nice to just check that flag and not have 
> to take all other options into account.


I think if __array_strides__ returns None (and if an object doesn't 
expose it you can assume it) it is probably good enough.


-Travis


From oliphant at ee.byu.edu  Wed Apr  6 17:17:13 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 17:17:13 2005
Subject: [Numpy-discussion] masked arrays and NaNs
In-Reply-To: <425467BB.305@hawaii.edu>
References: <425467BB.305@hawaii.edu>
Message-ID: <42547B2B.4030700@ee.byu.edu>

Eric Firing wrote:

> Travis,
>
> I am whole-heartedly in favor of your efforts to end the 
> Numeric/numarray split by combining the best of both. I am encouraged 
> by the progress you have made, and by the depth and clarity of the 
> accompanying technical discussions.  Thank you!
>
> I am a long-time Matlab user in Physical Oceanography, and I have been 
> trying to find a practical way to phase out Matlab.  One key is 
> matplotlib, which is coming along wonderfully.  A second is the 
> availability of a Num* (or scipy.base) module that provides the 
> functionality and ease-of-use I presently get from Matlab.  This leads 
> to a request which I suspect and hope is consistent with your present 
> plans: efficient handling of NaNs and/or masked arrays.


I think both options will be available.    With the new error handling 
numarray showed nans will be allowed if you set the error mode correctly.

A verson of masked arrays will also be available (either in python or 
C).   

-Travis


From cookedm at physics.mcmaster.ca  Wed Apr  6 17:18:51 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 17:18:51 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca> (David M. Cooke's
 message of "Wed, 06 Apr 2005 17:41:50 -0400")
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <qnkis2zigo4.fsf@arbutus.physics.mcmaster.ca>

cookedm at physics.mcmaster.ca (David M. Cooke) writes:

> I've always found the Numeric setup.py to be not very user-friendly.
> So, I rewrote it. It's available as patch #1178095
> http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369
>
> Basically, all the editing you need to do is in customize.py, instead
> of touching setup.py. No more commenting out files for lapack_lite
> (just tell it to use the system LAPACK, and tell it where to find it).
>
> Also, you could now use GSL's cblas interface for dotblas. Useful if
> you've already taken the trouble to link that with an optimized
> Fortran BLAS.
>
> I didn't want to just through this into CVS without feedback first :-)
> If it looks good, this can go in Numeric 24.0.

I've checked it in.

Highlights:

* You only need to edit customize.py

* You don't need to edit if you're on OS X (>= 10.2): the vecLib
  framework for optimized BLAS and LAPACK will be used if found.

* If you have an incomplete ATLAS library (one without LAPACK), you
  can use it for BLAS (instead of blas_lite.c), and the included f2c'd
  routines for LAPACK will be used.

* Use whatever CBLAS interface you've got (ATLAS, GSL, the reference
  one available from netlib).

There's also an INSTALL file now, although it could some comments
about the 'python setup.py config' option.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From oliphant at ee.byu.edu  Wed Apr  6 18:14:33 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 18:14:33 2005
Subject: [Numpy-discussion] New array interface helper file
Message-ID: <4254890F.6080205@ee.byu.edu>

At http://numeric.scipy.org/array_interface.py

you will find the start of a set of helper functions for the array 
interface that can make it more easy to deal with.   It also documents 
the array interface with docstrings.  I tried to attach these to 
properties, but then I don't know how to "see" them from Python. 

This is the kind of thing I think should go into Python

If anybody would like to try their hand at converter functions to go 
back and forth between the struct module strings and the __array_descr__ 
string, make my day.

-Travis


From cookedm at physics.mcmaster.ca  Wed Apr  6 21:41:12 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Wed Apr  6 21:41:12 2005
Subject: [Numpy-discussion] Request for comments on a new setup.py for
 Numeric
In-Reply-To: <42546CC7.40408@ee.byu.edu> (Travis Oliphant's message of "Wed,
 06 Apr 2005 17:12:07 -0600")
References: <qnk8y3vk2g1.fsf@arbutus.physics.mcmaster.ca>
	<425458F7.9020307@ee.byu.edu>
	<qnkvf6zilv0.fsf@arbutus.physics.mcmaster.ca>
	<42546CC7.40408@ee.byu.edu>
Message-ID: <qnkd5t7i4hi.fsf@arbutus.physics.mcmaster.ca>

Travis Oliphant <oliphant at ee.byu.edu> writes:

> David M. Cooke wrote:
>>While I'm at it, I'm also thinking of writing a 'cblas_lite' for
>>dotblas. This would mean that dotblas would be enabled all the time.
>>You could use a C BLAS if you've got one (from ATLAS, say), or a
>>Fortran BLAS (like the cxml library on an Alpha running Tru64), or it
>>would use the existing blas_lite.c if you don't.
>>
> This is a good idea, but for more than just dotblas.

Hmm, like for what? dotblas is the only thing (in Numeric & numarray)
that uses the cblas_* functions. Unless you're thinking of using them
in more places, like ufuncs? cblas_lite would be thin shims with minimal
error-checking, probably not much use outside of dotblas.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From rkern at ucsd.edu  Wed Apr  6 21:47:30 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Wed Apr  6 21:47:30 2005
Subject: [Numpy-discussion] New array interface helper file
In-Reply-To: <4254890F.6080205@ee.byu.edu>
References: <4254890F.6080205@ee.byu.edu>
Message-ID: <4254BB2B.2000406@ucsd.edu>

Travis Oliphant wrote:
> 
> At http://numeric.scipy.org/array_interface.py
> 
> you will find the start of a set of helper functions for the array 
> interface that can make it more easy to deal with.   It also documents 
> the array interface with docstrings.  I tried to attach these to 
> properties, but then I don't know how to "see" them from Python.

Get it from the property object on the class itself.
E.g.

   expanded.__array_shape__.__doc__

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From oliphant at ee.byu.edu  Wed Apr  6 22:13:04 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr  6 22:13:04 2005
Subject: [Numpy-discussion] New array interface helper file
In-Reply-To: <4254BB2B.2000406@ucsd.edu>
References: <4254890F.6080205@ee.byu.edu> <4254BB2B.2000406@ucsd.edu>
Message-ID: <4254C141.9040502@ee.byu.edu>

Robert Kern wrote:

> Travis Oliphant wrote:
>
>>
>> At http://numeric.scipy.org/array_interface.py
>>
>> you will find the start of a set of helper functions for the array 
>> interface that can make it more easy to deal with.   It also 
>> documents the array interface with docstrings.  I tried to attach 
>> these to properties, but then I don't know how to "see" them from 
>> Python.
>
>
> Get it from the property object on the class itself.
> E.g.
>
>   expanded.__array_shape__.__doc__
>
Thank you. 

-Travis


From Chris.Barker at noaa.gov  Wed Apr  6 23:36:36 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed Apr  6 23:36:36 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4254778A.1070100@ee.byu.edu>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca> <42546766.5060802@noaa.gov> <4254778A.1070100@ee.byu.edu>
Message-ID: <4254D4A8.5020007@noaa.gov>

Travis Oliphant wrote:

> You should account for the '<' or '>' that might be present in 
> __array_typestr__   (Numeric won't put it there, but scipy.base and 
> numarray will---since they can have byteswapped arrays internally). 

Good point, but a pain. Maybe they should be required, that way I don't 
have to first check for the presence of '<' or '>', then check if they 
have the right value.

> A more generic interface would handle multiple integer types if possible 

I'd like to support doubles as well...

> (but this is a good start...)

Right. I want to get _something_ working, before I try to make it universal!

> I think one idea here is that if __array_strides__ returns None, then 
> C-style contiguousness is assumed.   In fact, I like that idea so much 
> that I just changed the interface.  Thanks for the suggestion.

You're welcome. I like that too.

> No, they won't always be there for SciPy arrays (currently 4 of them 
> are).  Only record-arrays will provide __array_descr__ for example and 
> __array_offset__ is unnecessary for SciPy arrays.  I actually don't much 
> like the __array_offset__  parameter myself, but Scott convinced me that 
> it would could be useful for very complicated array classes.

I can see that it would, but then, we're stuck with checking for all 
these optional attributes. If I don't bother to check for it, one day, 
someone is going to pass a weird array in with an offset, and a strange 
bug will show up.

> e.g.  ndarray.cint  (gives 'iX' on the correct platform).
> For now, I would check (__array_typestr__ == 'i%d' % 
> array.array('i',[0]).itemsize)

I can see that that would work, but it does feel like a hack. BEsides, I 
might be doign this in C++ anyway, so it would probably be easier to use 
sizeof()


> But, on most platforms these days an int is 4 bytes, but the about would 
> be just to make sure.

Right. Making that assumption will jsut lead to weird bugs way don't he 
line. Of course, I wouldn't be surprised if wxWidgets and/or python 
makes that assumption in other places anyway!

>> 5) Why is: __array_data__ optional? Isn't that the whole point of this?
> 
> Because the object itself might expose the buffer interface.  We could 
> make __array_data__ required and prefer that it return a buffer object.  

Couldn't it be required, and return a reference to itself if that works?

Maybe I'm just being lazy, but it feels clunky and prone to errors to 
keep having to check if a attribute exists, then use it (or not).

> So, the correct consumer usage for grabbing the data is
> 
> data = getattr(obj, '__array_data__', obj)

Ah! I hadn't noticed the default parameter to getattr(). That makes it 
much easier. Is there an equivalent in C? It doesn't look like it to me, 
but I'm kind of a newbie with the C API.

> int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int 
> *buffer_len)

I'm starting to get this.

> Of course this approach has the 32-bit limit until we get this changed 
> in Python.

That's the least of my worries!

>> 6) Should __array_offset__ be optional? I'd rather it were required, 
>> but  default to zero. This way I have to check for it, then use it. 
>> Also, I assume it is an integer number of bytes, is that right?
> 
> A consumer has to check for most of the optional stuff if they want to 
> support all types of arrays.

That's not quite true. I'm happy to support only the simple types of 
arrays (contiguous, single type elements, zero offset(, but I have to 
check all that stuff to make sure that I have a simple array. The 
simplest arrays are the most common case, they should be as easy as 
possible to support.

> Again a simple:
> 
> getattr(obj, '__array_offset__', 0)
> 
> works fine.

not too bad.

Also, what if we find the need for another optional attribute later? Any 
older code won't check for it. Or maybe I'm being paranoid....

>> 7) An alternative to the above: A __simple_ flag, that means the data 
>> is a simple, C array of contiguous data of a single type. The most 
>> common use, and it would be nice to just check that flag and not have 
>> to take all other options into account.

  > I think if __array_strides__ returns None (and if an object doesn't
> expose it you can assume it) it is probably good enough.

That and __array_typestr__

Travis Oliphant wrote:
> 
> At http://numeric.scipy.org/array_interface.py
> 
> you will find the start of a set of helper functions for the array 
> interface that can make it more easy to deal with. 

Ah! this may well address my concerns. Good idea.

Thanks for all your work on this Travis.

By the way, a quote form Robin Dunn about this:

"Sweet!"

Thought you might appreciate that.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                      		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From konrad.hinsen at laposte.net  Wed Apr  6 23:55:02 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Wed Apr  6 23:55:02 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl> <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <2701da761c9f34fc1dc72fc97e87e788@laposte.net>

On 07.04.2005, at 00:43, David M. Cooke wrote:

> I like this! It's got namespace goodness all over it (last Python zen
> line in 'import this': Namespaces are one honking great idea -- let's
> do more of those!)

Sounds like a good principle!

> 1) arrays. Here, we want efficient computation of functions applied to
>    lots of elements. That's where the output arguments and special
>    methods (.reduce, .accumulate, and .outer) are useful

All that is accessible if the class gets passed the ufunc object.

> 2) polymorphic functions. Output arguments aren't useful here. The
>    special methods are useful for binary ufuncs only.

Fine, then they just call the ufunc. And the rare cases that need  
explicit code for each ufunc (my Derivatives, for example) can retrieve  
the name of the ufunc and dispatch on it.

Konrad.
--
------------------------------------------------------------------------ 
-------
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
------------------------------------------------------------------------ 
-------


From konrad.hinsen at laposte.net  Thu Apr  7 00:24:04 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Thu Apr  7 00:24:04 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <BAY103-F39227597DD89A7D5D14AFAA43D0@phx.gbl>
References: <BAY103-F39227597DD89A7D5D14AFAA43D0@phx.gbl>
Message-ID: <1986f60349f1d4d146c6ddb727362fd9@laposte.net>

On 06.04.2005, at 18:06, S?bastien de Menten wrote:

> Do you think it is possible to integrate a similar mechanism in array  
> functions (like searchsorted, argmax, ...).

That is less obvious. A generic interface for ufuncs is possible  
because of the uniform calling interface. Actually, there should  
perhaps be two ufunc application methods, for unary and for binary  
ufuncs. The other array functions each have a peculiar calling pattern.  
They can certainly be implemented through delegation to a method, but  
that would be one method per function. But I think that is inevitable  
if you want full flexibility.

> If we can register functions taking one array as argument within  
> scipy.base and let it dispatch those functions as ufunc, we could use  
> a similar strategy.
>
> For instance, let "sort" and "argmax" be registered as gfunc (general  
> functions on an array <> ufunc), then any class that would like to  
> overide any of them could do it too with the same trick Konrad exposed  
> here above.

Does that make sense in practice? Suppose you write a class that  
implements tables, i.e. arrays plus axis labels. You would want sort()  
to return an object of the same class, but argmax() to return a plain  
integer. The generic gfunc handler could do little else than dispatch  
on the name of the gfunc.

> Konrad, do you think it is tricky to have a prototype of your  
> suggestion (i.e. the modification does not need a full understanding  
> of Numeric and you can locate it approximately in the source code) ?

I haven't looked at the Numeric code in ages, but my guess is that the  
ufunc part should be easy to do, as it is just a modification of a  
generic handler that already exists.

Konrad.
--
------------------------------------------------------------------------ 
-------
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
------------------------------------------------------------------------ 
-------


From cookedm at physics.mcmaster.ca  Thu Apr  7 00:55:37 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Thu Apr  7 00:55:37 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4254D4A8.5020007@noaa.gov> (Chris Barker's message of "Wed, 06
 Apr 2005 23:35:20 -0700")
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>
	<42546766.5060802@noaa.gov> <4254778A.1070100@ee.byu.edu>
	<4254D4A8.5020007@noaa.gov>
Message-ID: <qnkvf6zggyt.fsf@arbutus.physics.mcmaster.ca>

"Chris Barker" <Chris.Barker at noaa.gov> writes:

> Travis Oliphant wrote:
>
>> You should account for the '<' or '>' that might be present in
>> __array_typestr__   (Numeric won't put it there, but scipy.base and
>> numarray will---since they can have byteswapped arrays internally).
>
> Good point, but a pain. Maybe they should be required, that way I
> don't have to first check for the presence of '<' or '>', then check
> if they have the right value.

I'll second this. Pulling out more Python Zen: Explicit is better than implicit.

>> A more generic interface would handle multiple integer types if
>> possible
>
> I'd like to support doubles as well...
>
>> (but this is a good start...)
>
> Right. I want to get _something_ working, before I try to make it universal!
>
>> I think one idea here is that if __array_strides__ returns None,
>> then C-style contiguousness is assumed.   In fact, I like that idea
>> so much that I just changed the interface.  Thanks for the
>> suggestion.
>
> You're welcome. I like that too.
>
>> No, they won't always be there for SciPy arrays (currently 4 of them
>> are).  Only record-arrays will provide __array_descr__ for example
>> and __array_offset__ is unnecessary for SciPy arrays.  I actually
>> don't much like the __array_offset__  parameter myself, but Scott
>> convinced me that it would could be useful for very complicated
>> array classes.
>
> I can see that it would, but then, we're stuck with checking for all
> these optional attributes. If I don't bother to check for it, one day,
> someone is going to pass a weird array in with an offset, and a
> strange bug will show up.

Here's a summary:

Attributes           required by            required
                     array-like object      to be checked
__array_shape__           yes                   yes
__array_typestr__         yes                   yes
__array_descr__           no                    no
__array_data__            no                    yes
__array_strides__         no                    yes
__array_mask__            no                    no?
__array_offset__          no                    yes

I'm assuming in "required to be checked" column a user of the array
that's interested in looking at all of the elements, so we have to
consider all possible situations where forgetting to consider an
attribute could lead to invalid memory accesses. __array_strides__ and
__array_offset__ in particular could be troublesome if forgotten.

The __array_mask__ element is difficult: for most applications, you
should check it, and raise an error if exists and is not None, unless
you can handle missing elements. It's certainly not required that all
users of an array object need to understand all array types!

Since we have to check a bunch anyways, I think that's a good enough
reason for having them to exist? There are suitable defaults defined
in the protocol document (__array_strides__ in particular) that make
it easy to add them in simple cases.

>> So, the correct consumer usage for grabbing the data is
>> data = getattr(obj, '__array_data__', obj)
>
> Ah! I hadn't noticed the default parameter to getattr(). That makes it
> much easier. Is there an equivalent in C? It doesn't look like it to
> me, but I'm kind of a newbie with the C API.

You'd want something like

adata = PyObject_GetAttrString(array_obj, "__attr_data__");
if (!adata) {
    /* error */
    PyErr_Clear();
    adata = array_obj;
}

>> int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int
>> *buffer_len)
>
> I'm starting to get this.
>
>> Of course this approach has the 32-bit limit until we get this
>> changed in Python.
>
> That's the least of my worries!
>
>>> 6) Should __array_offset__ be optional? I'd rather it were
>>> required, but  default to zero. This way I have to check for it,
>>> then use it. Also, I assume it is an integer number of bytes, is
>>> that right?
>> A consumer has to check for most of the optional stuff if they want
>> to support all types of arrays.
>
> That's not quite true. I'm happy to support only the simple types of
> arrays (contiguous, single type elements, zero offset(, but I have to
> check all that stuff to make sure that I have a simple array. The
> simplest arrays are the most common case, they should be as easy as
> possible to support.
>
>> Again a simple:
>> getattr(obj, '__array_offset__', 0)
>> works fine.
>
> not too bad.
>
> Also, what if we find the need for another optional attribute later?
> Any older code won't check for it. Or maybe I'm being paranoid....

This is a good point; all good protocols embed a version somewhere.
Not doing it now could lead to grief/pain later.

I'd suggest adding to __array_data__: If __array_data__ is None, then
the array is implementing a newer version of the interface, and you'd
either need to support that (maybe the new version uses
__array_data2__ or something), or use the sequence protocol on the
original object. The sequence protocol should definitely be safe all
the time, whereas the buffer protocol may not. (Put it this way: I
understand the sequence protocol well, but not the buffer one :-)

That would also be a good argument for it existing, I think.

Alternatively, we could add an __array_version__ attribute (required
to exist, required to check) which is set to 1 for this protocol.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From magnus at hetland.org  Thu Apr  7 01:05:03 2005
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr  7 01:05:03 2005
Subject: [Numpy-discussion] Possible example application of the array interface
In-Reply-To: <bbcd77d00504061137318773ed@mail.gmail.com>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> <bbcd77d00504061137318773ed@mail.gmail.com>
Message-ID: <20050407080429.GB20252@idi.ntnu.no>

Bruce Southey <bsouthey at gmail.com>:
>
> Hi,
> I don't see that it is feasible to link R and numerical python in this
> way. As you point out, R objects (R is an object orientated language)
> uses a lot of meta-data.  Then there is the IEEE stuff (NaN etc) that
> would also need to be handled in numerical python.

Too bad. (I seem to recall seing somehthing about numpy
conversion on the Web pages of RPy, though; perhaps, if one can stand
a bit of copying, the two can be used together after all?)

> You probably could get RPy or RSPython to use numerical python rather
> than just baisc Python.
> 
> What statistical functions would you want in numerical python? 

I think I'd want most of the standard, parametrized probability
distributions (as well as automatic estimation from data, perhaps) and
a handful of common statistical tests (t-test, z-test, Fishcher,
chi-squared, what-have-you). Perhaps some support for factorial
experiments (not sure if R has anything specific there, though).

And another thing: R seems to have vary fancy (although difficult to
use) plotting capabilities... Until SciPy catches up (it hasn't yet,
has it? ;) that might be a reason for using R(Py) as well, I guess.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]


From cookedm at physics.mcmaster.ca  Thu Apr  7 01:08:11 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Thu Apr  7 01:08:11 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <2701da761c9f34fc1dc72fc97e87e788@laposte.net> (konrad hinsen's
 message of "Thu, 7 Apr 2005 08:53:06 +0200")
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl>
	<8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net>
	<qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca>
	<2701da761c9f34fc1dc72fc97e87e788@laposte.net>
Message-ID: <qnku0mjgge0.fsf@arbutus.physics.mcmaster.ca>

konrad.hinsen at laposte.net writes:

> On 07.04.2005, at 00:43, David M. Cooke wrote:
>
>> I like this! It's got namespace goodness all over it (last Python zen
>> line in 'import this': Namespaces are one honking great idea -- let's
>> do more of those!)
>
> Sounds like a good principle!
>
>> 1) arrays. Here, we want efficient computation of functions applied to
>>    lots of elements. That's where the output arguments and special
>>    methods (.reduce, .accumulate, and .outer) are useful
>
> All that is accessible if the class gets passed the ufunc object.
>
>> 2) polymorphic functions. Output arguments aren't useful here. The
>>    special methods are useful for binary ufuncs only.
>
> Fine, then they just call the ufunc. And the rare cases that need
> explicit code for each ufunc (my Derivatives, for example) can
> retrieve  the name of the ufunc and dispatch on it.

Hmm, I had misread your previous code. Here it is again, made more
specific, and I'll assume this function lives in the ndarray package
(as there is more than one package that defines ufuncs)

def cos(obj):
    if ndarray.isarray(obj):
        return ndarray.array_cos(obj)
    else:
        try:
            return obj.__ufunc__(cos)
        except AttributeError:
            if ndarray.is_array_like(obj):
                a = ndarray.array(obj)
                return ndarray.array_cos(a)
            else:
                raise ValueError

The thing is obj.__ufunc__ must understand about the *particular*
object cos: the ndarray one. I was thinking more along the lines of
obj.__ufunc__('cos'), where the name is passed instead.

For binary ufuncs, you could use (with arguments obj1 and obj2),
obj1.__ufunc__('add', obj2)

Output argument (obj3): obj1.__ufunc__('add', obj2, obj3)
Special methods:
    obj1.__ufunc__('add.reduce')
    obj1.__ufunc__('add.accumulate')
    obj1.__ufunc__('add.outer', obj2)

Basically, special methods are just another ufunc. This suggests that
add.outer should optionally take an output argument...

Alternatively, __ufunc__ could be an object of implemented ufuncs:

obj.__ufunc__.cos()
obj1.__ufunc__.add(obj2)
obj1.__ufunc__.add(obj2, obj3)
obj1.__ufunc__.add.reduce()
obj1.__ufunc__.add.accumulate()
obj1.__ufunc__.add.outer(obj2)

It depends where you want to do the dispatch. I think this version is
better: it's easier to discover what __ufunc__'s are supported with
generic tools (IPython tab completion, pydoc, etc.).

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From konrad.hinsen at laposte.net  Thu Apr  7 01:34:37 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Thu Apr  7 01:34:37 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <qnku0mjgge0.fsf@arbutus.physics.mcmaster.ca>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl> <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> <qnku0mjgge0.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <9d8cfa0b284c9b9be787970030e6b3de@laposte.net>

On Apr 7, 2005, at 10:06, David M. Cooke wrote:

> Hmm, I had misread your previous code. Here it is again, made more
> specific, and I'll assume this function lives in the ndarray package
> (as there is more than one package that defines ufuncs)

At the moment, there is one in Numeric and one in numarray. The Python 
API of both is nearly or fully identical.

> The thing is obj.__ufunc__ must understand about the *particular*
> object cos: the ndarray one. I was thinking more along the lines of

No, it must only know the interface. In most cases, it would do 
something like

	class MyArray:
		def __ufunc__(self, ufunc):
			return MyArray(apply(ufunc, self.data))

> obj.__ufunc__('cos'), where the name is passed instead.

That's also an interesting option. It would require the implementing 
class to choose an appropriate function from an appropriate module. 
Alternatively, it would work if ufuncs were also accessible as methods 
on array objects.

> For binary ufuncs, you could use (with arguments obj1 and obj2),
> obj1.__ufunc__('add', obj2)

Except that it would perhaps be better to have a different method, as 
otherwise nearly every implementation would have to start with a 
condition test to distinguish unary from binary ufuncs.

> Output argument (obj3): obj1.__ufunc__('add', obj2, obj3)
> Special methods:
>     obj1.__ufunc__('add.reduce')
>     obj1.__ufunc__('add.accumulate')
>     obj1.__ufunc__('add.outer', obj2)
>
> Basically, special methods are just another ufunc. This suggests that
> add.outer should optionally take an output argument...

But they are not just another ufunc, because a standard unary ufunc 
always returns an array of the same shape as its argument.

I'd probably prefer a few explicit methods:

	object.__unary__(cos)
	object.__binary__(add, other)
	object.__binary_reduce__(add)

etc.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Laboratoire L?on Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
---------------------------------------------------------------------


From Sebastien.deMentendeHorne at electrabel.com  Thu Apr  7 02:26:28 2005
From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com)
Date: Thu Apr  7 02:26:28 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
    scipy.base or Numeric3)
Message-ID: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be>

> 
> On Apr 7, 2005, at 10:06, David M. Cooke wrote:
> 
> > Hmm, I had misread your previous code. Here it is again, made more
> > specific, and I'll assume this function lives in the ndarray package
> > (as there is more than one package that defines ufuncs)
> 
> At the moment, there is one in Numeric and one in numarray. 
> The Python 
> API of both is nearly or fully identical.
> 
> > The thing is obj.__ufunc__ must understand about the *particular*
> > object cos: the ndarray one. I was thinking more along the lines of
> 
> No, it must only know the interface. In most cases, it would do 
> something like
> 
> 	class MyArray:
> 		def __ufunc__(self, ufunc):
> 			return MyArray(apply(ufunc, self.data))

Exactly ! I see this as a very common use (masked arrays and all the other examples could live with that).
Or more precisely (just to be explicity as the previous MyArray example is the simplest (purest) one),
 	class MyArray:
 		def __ufunc__(self, ufunc):
			metadata= process(self.metadata, ufunc)
			data = apply(ufunc, self.data)
 			return MyArray(data, metadata)
Or variations on this same theme.

BTW, looking at Numeric3, the presence of a __mask_array__ in the array protocol looks like we want to add a specific case of "augmented array" to the core protocol. Hmmm, rather prefer to build a more generic mechanism as well as a clean interface for interacting with "augmented array".

> 
> > obj.__ufunc__('cos'), where the name is passed instead.
> 
> That's also an interesting option. It would require the implementing 
> class to choose an appropriate function from an appropriate module. 
> Alternatively, it would work if ufuncs were also accessible 
> as methods 
> on array objects.
> 

Why not have the ability to ask the name of an ufunc to be able to dispatch on it ?

> > For binary ufuncs, you could use (with arguments obj1 and obj2),
> > obj1.__ufunc__('add', obj2)
> 
> Except that it would perhaps be better to have a different method, as 
> otherwise nearly every implementation would have to start with a 
> condition test to distinguish unary from binary ufuncs.
> 
> > Output argument (obj3): obj1.__ufunc__('add', obj2, obj3)
> > Special methods:
> >     obj1.__ufunc__('add.reduce')
> >     obj1.__ufunc__('add.accumulate')
> >     obj1.__ufunc__('add.outer', obj2)
> >
> > Basically, special methods are just another ufunc. This 
> suggests that
> > add.outer should optionally take an output argument...
> 
> But they are not just another ufunc, because a standard unary ufunc 
> always returns an array of the same shape as its argument.
> 
> I'd probably prefer a few explicit methods:
> 
> 	object.__unary__(cos)
> 	object.__binary__(add, other)
> 	object.__binary_reduce__(add)
> 

What about :

object.__unary__(cos, mode = "reduce")
object.__binary__(cos, other, mode = "reduce")

or

object.__unary__(cos.reduce)
object.__binary__(cos.apply, other) or object.__binary__(cos.__call__, other)
with the ability to ask to the first argument its type (with cos.mode or cos.reduce.mode ...)

However, for binary operations, how it the call dispatched if one of the operand is of a type while the other is another type ? This problem is related to multimethods http://www.artima.com/weblogs/viewpost.jsp?thread=101605


=======================================================
This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.

http://www.electrabel.be/homepage/general/disclaimer_EN.asp
=======================================================


From konrad.hinsen at laposte.net  Thu Apr  7 02:42:07 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Thu Apr  7 02:42:07 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
In-Reply-To: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be>
References: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be>
Message-ID: <af5840d7d07f41d72f48b6f8025f3654@laposte.net>

On Apr 7, 2005, at 11:25, Sebastien.deMentendeHorne at electrabel.com 
wrote:

> Why not have the ability to ask the name of an ufunc to be able to 
> dispatch on it ?

That's already possible.

> What about :
>
> object.__unary__(cos, mode = "reduce")
> object.__binary__(cos, other, mode = "reduce")

What does "reduce" mode mean for cos?
What does a binary ufunc in reduce mode do with its second argument?

> However, for binary operations, how it the call dispatched if one of 
> the operand is of a type while the other is another type ? This 
> problem is related to multimethods 
> http://www.artima.com/weblogs/viewpost.jsp?thread=101605

No need to be innovative: Python always dispatches on the first 
argument, and everybody is familiar with that approach even though it 
isn't perfect. If Python 3000 has multimethods, we can still adapt.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Laboratoire L?on Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
---------------------------------------------------------------------


From Sebastien.deMentendeHorne at electrabel.com  Thu Apr  7 02:54:57 2005
From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com)
Date: Thu Apr  7 02:54:57 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
    scipy.base or Numeric3)
Message-ID: <6E48F3D185CF644788F55917A0D50A9314A9AB@seebex02.eib.electrabel.be>

> > Why not have the ability to ask the name of an ufunc to be able to 
> > dispatch on it ?
> 
> That's already possible.
> 
> > What about :
> >
> > object.__unary__(cos, mode = "reduce")
> > object.__binary__(cos, other, mode = "reduce")
> 
> What does "reduce" mode mean for cos?
> What does a binary ufunc in reduce mode do with its second argument?

raise a ValueError :-)
It was an example of a way to pass argument, the focus was on cos.reduce or "cos.reduce" or cos, "reduce".

> > However, for binary operations, how it the call dispatched 
> if one of 
> > the operand is of a type while the other is another type ? This 
> > problem is related to multimethods 
> > http://www.artima.com/weblogs/viewpost.jsp?thread=101605
> 
> No need to be innovative: Python always dispatches on the first 
> argument, and everybody is familiar with that approach even though it 
> isn't perfect. If Python 3000 has multimethods, we can still adapt.

The problematic is related to multimethods, the implementation should not be specially related.

In an a call like object.__binary__(add, other), if other is not of the same type of object, the latter could throw an exception as ImplementationError to give the hand to other.__binary__(add, binary) or to other.__binary__(radd, binary) or similar (i.e. those expressions may not make sense but the
idea is to have a convention to give the hand to the other operand, python does this already when one overloads an operator like __add__ (__radd__)).
So if we can keep this same protocol for binary ufunc, that would be great.

Otherwise, I think it is not that a big deal.

Sebastien


=======================================================
This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.

http://www.electrabel.be/homepage/general/disclaimer_EN.asp
=======================================================


From xscottg at yahoo.com  Thu Apr  7 04:35:49 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 04:35:49 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4254778A.1070100@ee.byu.edu>
Message-ID: <20050407113421.49329.qmail@web50202.mail.yahoo.com>

--- Travis Oliphant <oliphant at ee.byu.edu> wrote:
> >
> > 2) As __array_strides__ is optional, I'd kind of like to have a 
> > __contiguous__ flag that I could just check, rather than checking for 
> > the existence of strides, then calculating what the strides should be, 
> > then checking them.
> 
> 
> I don't want to add too much.  The other approach is to establish a set 
> of helper functions in Python to check this sort of thing:   Thus, if 
> you can't handle a general array you check:
> 
> ndarray.iscontiguous(obj) 
> 
> where obj exports the array interface.
> 
> But, it could really go either way.   What do others think?
> 

I think this should definitely be done in the helper functions.  Having
extra attributes encode redundant information is a recipe for trouble.


Cheers,
    -Scott


From xscottg at yahoo.com  Thu Apr  7 04:43:37 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 04:43:37 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4254D4A8.5020007@noaa.gov>
Message-ID: <20050407114157.23887.qmail@web50209.mail.yahoo.com>

--- Chris Barker <Chris.Barker at noaa.gov> wrote:
> 
> I can see that it would, but then, we're stuck with checking for all 
> these optional attributes. If I don't bother to check for it, one day, 
> someone is going to pass a weird array in with an offset, and a strange 
> bug will show up.
> 

Everyone seems to think that an offset is so weird.  I haven't looked at
the internals of Numeric/scipy.base in a while so maybe it doesn't apply
there.  However, if you subscript an array and return a view to the data,
you need an offset or you need to create a new buffer that encodes the
offset for you.

    A = reshape(arange(9), (3,3))

        0, 1, 2
        3, 4, 5
        6, 7, 8

    B = A[2]    # create a view into A

        6, 7, 8 # Shared with the data above


Unless you're going to create a new buffer (which I guess is what Numeric
is doing), the offset for B would be 6 in this very simple case.  I think
specifying the offset is much more elegant than creating a new buffer
object with a hidden offset that refers to the old buffer object.

I guess all I'm saying is that I wouldn't assume the offset is zero...


> 
> Couldn't it be required, and return a reference to itself if that works?
> 
> Maybe I'm just being lazy, but it feels clunky and prone to errors to 
> keep having to check if a attribute exists, then use it (or not).
> 

The problem is that you aren't being lazy enough.  :-)

The fact that a lot of these attributes are optional should be hidden in
helper functions like those in Travis's array_interface.py module, or a
C/C++ include file (with inline functions).

In a short while, you shouldn't have to check any __array_metadata__
attributes directly.  There should even be a helper function for getting
the array elements.


It wouldn't be a horrible mistake to have all the attributes be mandatory,
but it doesn't get array consumes any benefit that they can't get from a
well written helper library, and it does add some burden to array
producers. 


Cheers,
    -Scott


From mrmaple at gmail.com  Thu Apr  7 04:44:27 2005
From: mrmaple at gmail.com (James Carroll)
Date: Thu Apr  7 04:44:27 2005
Subject: [Numpy-discussion] Re: Questions about the array interface.
In-Reply-To: <42546766.5060802@noaa.gov>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>
	 <42546766.5060802@noaa.gov>
Message-ID: <b273c741050407044366ded656@mail.gmail.com>

Hi Chris, Travis, ...

Great conversation you've started.  I have two questions at the
moment...   I do love the idea that an abstraction can bring the
different but similar num* worlds together.

Which sourceforge CVS repository is the interface (and an
implementation) show up on first?  My guess is numpy/numeric3
I see Travis has been updating it while I sleep.

>      def DrawPointList(self, points, pens=None):
>         ...
>         # some checking code on the pens)
>          ...
>          if (hasattr(points,'__array_shape__') and
>                  hasattr(points,'__array_typestr__') and
>                  len(points.__array_shape__) == 2 and
>                  points.__array_shape__[1] == 2 and
>                  points.__array_typestr__ == 'i4' and
>                  ): # this means we have a compliant array
>             # return the array protocol version
>             return self._DrawPointArray(points.__array_data__, pens,[])
>                     #This needs to be written now!

This means that whenever you have some complex multivalued
multidementional structure with the data you want to plot, you have to
reshape it into the above 'compliant' array before passing it on.  I'm
a newbie, but is this reshape something where the data has to be
copied and take up memory twice?  If not, then great, you would
painlessly reshape into something that had a different set of strides
that just accessed the data that complied in the big blob of data.  If
the reshape is expensive, then maybe we need the array abstraction,
and then a second 'thing' that described which parts of the array to
use for the sequence of 2-tuples to use for plotting the x,y s of a
scatter plot. (or whatever)

I do think we can accept more than just i4 for a datatype.  Especially
since a last-minute cast to i4 in inexpensive for almost every data
type.

>          else:
>              #return the generic python sequence version
>              return self._DrawPointList(points, pens, [])
> 
> Then we'll need a function (in C++):
>   _DrawPointArray(points.__array_data__, pens,[])

Looks great.

-Jim


From xscottg at yahoo.com  Thu Apr  7 04:52:11 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 04:52:11 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <qnkvf6zggyt.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <20050407115141.96479.qmail@web50204.mail.yahoo.com>

--- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
> >
> > Good point, but a pain. Maybe they should be required, that way I
> > don't have to first check for the presence of '<' or '>', then check
> > if they have the right value.
> 
> I'll second this. Pulling out more Python Zen: Explicit is better than
> implicit.
> 

I'll third.


> 
> This is a good point; all good protocols embed a version somewhere.
> Not doing it now could lead to grief/pain later.
> 
> I'd suggest adding to __array_data__: If __array_data__ is None, then
> the array is implementing a newer version of the interface, and you'd
> either need to support that (maybe the new version uses
> __array_data2__ or something), or use the sequence protocol on the
> original object. The sequence protocol should definitely be safe all
> the time, whereas the buffer protocol may not. (Put it this way: I
> understand the sequence protocol well, but not the buffer one :-)
> 
> That would also be a good argument for it existing, I think.
> 
> Alternatively, we could add an __array_version__ attribute (required
> to exist, required to check) which is set to 1 for this protocol.
> 

I like this, although I think having __array_data__ return None is
confusing.  I think __array_version__ (or __array_protocol__?) is the
better choice.  How about have it optional and default to 1?  If it's
present and greater than 1 then it means there is something new going on...


Cheers,
    -Scott


From cjw at sympatico.ca  Thu Apr  7 05:57:36 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Thu Apr  7 05:57:36 2005
Subject: [Numpy-discussion] metadata and metabehavior for arrays (for
 scipy.base or Numeric3)
In-Reply-To: <9d8cfa0b284c9b9be787970030e6b3de@laposte.net>
References: <BAY103-F370C1172810C03634CAB82A43D0@phx.gbl> <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <qnku0mjil0a.fsf@arbutus.physics.mcmaster.ca> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> <qnku0mjgge0.fsf@arbutus.physics.mcmaster.ca> <9d8cfa0b284c9b9be787970030e6b3de@laposte.net>
Message-ID: <42552DD2.2040200@sympatico.ca>

konrad.hinsen at laposte.net wrote:

> On Apr 7, 2005, at 10:06, David M. Cooke wrote:
>
>> Hmm, I had misread your previous code. Here it is again, made more
>> specific, and I'll assume this function lives in the ndarray package
>> (as there is more than one package that defines ufuncs)
>
>
> At the moment, there is one in Numeric and one in numarray. The Python 
> API of both is nearly or fully identical.
>
>> The thing is obj.__ufunc__ must understand about the *particular*
>> object cos: the ndarray one. I was thinking more along the lines of
>
>
> No, it must only know the interface. In most cases, it would do 
> something like
>
>     class MyArray:
>         def __ufunc__(self, ufunc):
>             return MyArray(apply(ufunc, self.data))
>
>> obj.__ufunc__('cos'), where the name is passed instead.
>
>
> That's also an interesting option. It would require the implementing 
> class to choose an appropriate function from an appropriate module. 
> Alternatively, it would work if ufuncs were also accessible as methods 
> on array objects.
>
Yes, perhaps with a slightly different name (say Cos vs cos) to 
distinguish between methods and functions.  Since they don't require 
arguments, the methods would not require parentheses.

Colin W.


From bsouthey at gmail.com  Thu Apr  7 06:45:32 2005
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu Apr  7 06:45:32 2005
Subject: [Numpy-discussion] Possible example application of the array interface
In-Reply-To: <20050407080429.GB20252@idi.ntnu.no>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com>
	 <bbcd77d00504061137318773ed@mail.gmail.com>
	 <20050407080429.GB20252@idi.ntnu.no>
Message-ID: <bbcd77d005040706443865787e@mail.gmail.com>

Hi,
> > What statistical functions would you want in numerical python?
> 
> I think I'd want most of the standard, parametrized probability
> distributions (as well as automatic estimation from data, perhaps) and
> a handful of common statistical tests (t-test, z-test, Fishcher,
> chi-squared, what-have-you). Perhaps some support for factorial
> experiments (not sure if R has anything specific there, though).

Most of this is in SciPy already based Gary's code.  I have not looked
at it in great detail because is doesn't meet my immediate needs. One
of my major needs is to be able to handle missing values. Perhaps one
day it will handle that or I will get the time to do so.

I have been working on code with another person to do general linear
models  (along the lines of R's lm function and SAS's glm procedure)
that would address factorial and other experimental designs.  R just
doesn't do enough for me in this aspect.

Two real problems are data storage and model declaration. The mixed
model component is really only for my area and I want to use symmetric
matrices as the requirements of these models grow really fast.

I would be willing to try to address and contribute to the statistical
needs if people are interested because I prefer a 'pure python'
approach. The other way is to directly call some of the R functions
from Python since the main core of these functions are written in C
and Fortran.

> And another thing: R seems to have vary fancy (although difficult to
> use) plotting capabilities... Until SciPy catches up (it hasn't yet,
> has it? ;) that might be a reason for using R(Py) as well, I guess.
> 
> --
> Magnus Lie Hetland                    Fall seven times, stand up eight
> http://hetland.org                                  [Japanese proverb]
> 

Yeah, S/S+/R provides some nice graphs until you need to change from
the defaults.

Regards
Bruce


From Gilles.Simond at obs.unige.ch  Thu Apr  7 07:55:08 2005
From: Gilles.Simond at obs.unige.ch (SIMOND Gilles)
Date: Thu Apr  7 07:55:08 2005
Subject: [Numpy-discussion] Quite curious behaviour in Numeric
Message-ID: <1112885601.15142.53.camel@obssf5>


 2.6.8-1-686-smp (dilinger at toaster.hq.voxel.net) 
  (gcc version 3.3.4 (Debian 1:3.3.4-9)) 
  #1 SMP Sat Aug 28 12:51:43 EDT 2004:

  and    python2.3

    >>> a=Numeric.ones((2,3),'i')
    >>> b=Numeric.sum(a)+1
    >>> a[1]=b+1
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    TypeError: Array can not be safely cast to required type
    >>> a.itemsize()
    4
    >>> b.itemsize()
    4
    >>> a.typecode()
    'i'


and e following works

    >>> a=Numeric.ones((2,3))
    >>> b=Numeric.sum(a)+1
    >>> a[1]=b+1
    >>> a.itemsize()
    4
    >>> b.itemsize()
    4
    >>> a.typecode()
    'l'
    
    >>> type(1)
    <type 'int'>

    >>> Numeric.__version__
    '23.6'

  It  seems  that  itemsize()  does  not  return the correct value which
  should  be  8  for  'l'  type array. This is quite annoying since this
  function  is  the  only  way to know actual format of the array.


 Gilles Simond


From rkern at ucsd.edu  Thu Apr  7 08:17:44 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Thu Apr  7 08:17:44 2005
Subject: [Numpy-discussion] Possible example application of the array
 interface
In-Reply-To: <20050407080429.GB20252@idi.ntnu.no>
References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> <bbcd77d00504061137318773ed@mail.gmail.com> <20050407080429.GB20252@idi.ntnu.no>
Message-ID: <42554EC6.9090807@ucsd.edu>

Magnus Lie Hetland wrote:
> Bruce Southey <bsouthey at gmail.com>:

>>What statistical functions would you want in numerical python? 
> 
> 
> I think I'd want most of the standard, parametrized probability
> distributions (as well as automatic estimation from data, perhaps) and
> a handful of common statistical tests (t-test, z-test, Fishcher,
> chi-squared, what-have-you). Perhaps some support for factorial
> experiments (not sure if R has anything specific there, though).

Except for factorial designs, scipy.stats has all of that.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From oliphant at ee.byu.edu  Thu Apr  7 08:23:13 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Thu Apr  7 08:23:13 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050407115141.96479.qmail@web50204.mail.yahoo.com>
References: <20050407115141.96479.qmail@web50204.mail.yahoo.com>
Message-ID: <4255502D.6060306@ee.byu.edu>

Scott Gilbert wrote:

>--- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
>  
>
>>>Good point, but a pain. Maybe they should be required, that way I
>>>don't have to first check for the presence of '<' or '>', then check
>>>if they have the right value.
>>>      
>>>
>>I'll second this. Pulling out more Python Zen: Explicit is better than
>>implicit.
>>
>>    
>>
>
>I'll third.
>  
>

O.K.  It's done....


From curzio.basso at unibas.ch  Thu Apr  7 09:58:40 2005
From: curzio.basso at unibas.ch (Curzio Basso)
Date: Thu Apr  7 09:58:40 2005
Subject: [Numpy-discussion] profile reveals calls to astype()
Message-ID: <4255664F.2070107@unibas.ch>

Hi all,

I have a problem trying to profile a program using numarray, maybe someone with more experience can 
give me a hint...

basically, the program I am profiling has a function like this:

foo():
   # some code
   # a call to astype()
   for i in xrange(N):
     # some other code and NO explicit call to astype()

the problem is that when I print the 'callees' of foo(), astype() gets listed with an occurrence of 
N+1, as if it was called inside the loop.
So now the first doubt I have is that astype() gets listed because called from some function called 
by foo(), even if this should not happen. Here is the list of numarray functions called in foo()

Function                 called...
                           generic.py:651(getshape)(14)    0.070
                           generic.py:918(reshape)(2)    0.000
                           generic.py:1013(where)(2)    0.050
                           generic.py:1069(concatenate)(2)    4.270
                           morphology.py:150(binary_erosion)(2)    0.070
                           numarraycore.py:698(__del__)(120032)    3.240
                           numarraycore.py:817(astype)(12002)   37.290
                           numarraycore.py:857(is_c_array)(36000)   10.450
                           numarraycore.py:878(type)(4)    0.000
                           numarraycore.py:964(__mul__)(12)    0.340
                           numarraycore.py:981(__div__)(8)    0.010
                           numarraycore.py:1068(__pow__)(8)    0.000
                           numarraycore.py:1180(__imul__)(12000)    0.930
                           numarraycore.py:1250(__eq__)(2)    0.080
                           numarraycore.py:1400(zeros)(54)    0.060
                           numarraycore.py:1409(ones)(8)    0.020

The second thing I can think of is that astype() is implicitly called by some conversion. Can this be?

curzio


From jmiller at stsci.edu  Thu Apr  7 10:51:38 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr  7 10:51:38 2005
Subject: [Numpy-discussion] profile reveals calls to astype()
In-Reply-To: <4255664F.2070107@unibas.ch>
References: <4255664F.2070107@unibas.ch>
Message-ID: <1112896207.2437.34.camel@halloween.stsci.edu>

astype() is used in a bunch of places, including the C-API,  so it's
hard to guess how it's getting called with the information here.  In
general,  astype() gets called to "match up types" based on a particular
parameterization of a function call,  i.e.  the c-code underlying some
function call needs a different type than was passed in so astype() is
used to convert an array to a workable type.

One possibility for debugging this might be to drop N to something
reasonable, like say 2,  and then run under pdb with a breakpoint set on
astype().    Something like this is what I have in mind;  it may not be
exactly right but with fiddling this approach might work:

>>> from yourmodule import newfoo  # you redefined foo to accept N as a parameter
>>> import pdb
>>> pdb.run("newfoo(N=2)")
(pdb) s  # step along a little to get into newfoo()
... step output
(pdb) import numarray.numarraycore as nc
(pdb) break nc.astype
(pdb) c
... breakpoint output
(pdb) where
... function traceback showing where astype() got called from
(pdb) c
... breakpoint output
(pdb) where
... more function traceback, eventually you should find it...
...

Regards,
Todd

On Thu, 2005-04-07 at 12:56, Curzio Basso wrote:
> Hi all,
> 
> I have a problem trying to profile a program using numarray, maybe someone with more experience can 
> give me a hint...
> 
> basically, the program I am profiling has a function like this:
> 
> foo():
>    # some code
>    # a call to astype()
>    for i in xrange(N):
>      # some other code and NO explicit call to astype()
> 
> the problem is that when I print the 'callees' of foo(), astype() gets listed with an occurrence of 
> N+1, as if it was called inside the loop.
> So now the first doubt I have is that astype() gets listed because called from some function called 
> by foo(), even if this should not happen. Here is the list of numarray functions called in foo()
> 
> Function                 called...
>                            generic.py:651(getshape)(14)    0.070
>                            generic.py:918(reshape)(2)    0.000
>                            generic.py:1013(where)(2)    0.050
>                            generic.py:1069(concatenate)(2)    4.270
>                            morphology.py:150(binary_erosion)(2)    0.070
>                            numarraycore.py:698(__del__)(120032)    3.240
>                            numarraycore.py:817(astype)(12002)   37.290
>                            numarraycore.py:857(is_c_array)(36000)   10.450
>                            numarraycore.py:878(type)(4)    0.000
>                            numarraycore.py:964(__mul__)(12)    0.340
>                            numarraycore.py:981(__div__)(8)    0.010
>                            numarraycore.py:1068(__pow__)(8)    0.000
>                            numarraycore.py:1180(__imul__)(12000)    0.930
>                            numarraycore.py:1250(__eq__)(2)    0.080
>                            numarraycore.py:1400(zeros)(54)    0.060
>                            numarraycore.py:1409(ones)(8)    0.020
> 
> The second thing I can think of is that astype() is implicitly called by some conversion. Can this be?
> 
> curzio
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-- 


From Chris.Barker at noaa.gov  Thu Apr  7 11:38:43 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Thu Apr  7 11:38:43 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050407114157.23887.qmail@web50209.mail.yahoo.com>
References: <20050407114157.23887.qmail@web50209.mail.yahoo.com>
Message-ID: <42557DE3.3010804@noaa.gov>

Scott Gilbert wrote:
 > I think __array_version__ (or __array_protocol__?) is the
 > better choice.  How about have it optional and default to 1?  If it's
 > present and greater than 1 then it means there is something new going 
on...

Again, I'm uncomfortable with something that I have to check being 
optional. If it is, we're encouraging people to not check it, and that' 
a recipe for bugs later on down the road.

 > Everyone seems to think that an offset is so weird.  I haven't looked at
 > the internals of Numeric/scipy.base in a while so maybe it doesn't apply
 > there.  However, if you subscript an array and return a view to the data,
 > you need an offset or you need to create a new buffer that encodes the
 > offset for you.

 > I guess all I'm saying is that I wouldn't assume the offset is zero...

Good point. All the more reason to have the offset be mandatory.

 > The fact that a lot of these attributes are optional should be hidden in
 > helper functions like those in Travis's array_interface.py module, or a
 > C/C++ include file (with inline functions).

Yes, if there is a C/C++ version of all these helper functions, I'll be 
a lot happier. And you're right, the same information should not be 
encoded in two places, so my "iscontiguous" attribute should be a helper 
function or maybe a method.

 > In a short while, you shouldn't have to check any __array_metadata__
 > attributes directly.  There should even be a helper function for getting
 > the array elements.

Cool. How would that work? A C++ iterator? I"m thinking not, as this is 
all C, no?

 > It wouldn't be a horrible mistake to have all the attributes be 
mandatory,
 > but it doesn't get array consumes any benefit that they can't get from a
 > well written helper library, and it does add some burden to array
 > producers.

Hardly any. I'm assuming that there will be a base_array class that can 
be used as a base class or mixin, so it wouldn't be any work at all to 
have a full set of attributes with defaults. It would take up a little 
bit of memory. I'm assuming that the whole point of this is to support 
large datasets, but maybe that isn't a valid assumption, After all, 
small array support has turned out to be very important for Numeric.

As a rule of thumb, I think there will be consumers of arrays that 
producers, so I'd rather make it easy on the consumers that the 
producers, if we need to make such a trade off. Maybe I'm biased, 
because I'm a consumer.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From Chris.Barker at noaa.gov  Thu Apr  7 12:20:05 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Thu Apr  7 12:20:05 2005
Subject: [Numpy-discussion] Re: Questions about the array interface.
In-Reply-To: <b273c741050407044366ded656@mail.gmail.com>
References: <qnkmzscjiz0.fsf@arbutus.physics.mcmaster.ca>	 <42546766.5060802@noaa.gov> <b273c741050407044366ded656@mail.gmail.com>
Message-ID: <42558796.4070607@noaa.gov>


James Carroll wrote:

>>     def DrawPointList(self, points, pens=None):
>>        ...
>>        # some checking code on the pens)
>>         ...
>>         if (hasattr(points,'__array_shape__') and
>>                 hasattr(points,'__array_typestr__') and
>>                 len(points.__array_shape__) == 2 and
>>                 points.__array_shape__[1] == 2 and
>>                 points.__array_typestr__ == 'i4' and
>>                 ): # this means we have a compliant array
>>            # return the array protocol version
>>            return self._DrawPointArray(points.__array_data__, pens,[])
>>                    #This needs to be written now!
> 
> 
> This means that whenever you have some complex multivalued
> multidementional structure with the data you want to plot, you have to
> reshape it into the above 'compliant' array before passing it on.  I'm
> a newbie, but is this reshape something where the data has to be
> copied and take up memory twice?

Probably. It depends on two things:
1) What structure the data is in at the moment
2) Whether we write the code to handle more "complex" arrangements of 
data: discontiguous arrays, for instance.

But the idea is to require a data structure that makes sense for the 
data. For example, a natural way to store a whole set of coordinates is 
to use an NX2 NumPy array of doubles. This is exactly the data structure 
that I want the above function to accept. If the points are somehow a 
subset of a larger array, then they will be in a discontiguous array, 
and I'm not sure if I want to bother to try to handle that. You can 
always use the generic sequence interface to access the data, but that 
will be a lot slower. We're interfacing with a static language here, we 
can get optimum performance only by specifying a particular data structure.

> If not, then great, you would
> painlessly reshape into something that had a different set of strides
> that just accessed the data that complied in the big blob of data.  If
> the reshape is expensive, then maybe we need the array abstraction,
> and then a second 'thing' that described which parts of the array to
> use for the sequence of 2-tuples to use for plotting the x,y s of a
> scatter plot. (or whatever)

The proposed array interface does provide a certain level of 
abstraction, that's what:

__array_shape__
__array_typestr__
__array_descr__
__array_strides__
__array_offset__

Are all about we could certainly write the wxPy_LIST_helper functions to 
handle a larger variety of options that the simple contiguous C array, 
but I want to start with the simple case, and I'm not sure directly 
handling the more complex cases is worth it. I'm imagining that the user 
will need to do something like:

dc.DrawPointList(asarray(points, Int))

It's easier to use the utility functions that Numeric provides than 
re-write similar code in wxPython.

> I do think we can accept more than just i4 for a datatype.  Especially
> since a last-minute cast to i4 in inexpensive for almost every data
> type.

Sure, but we're interfacing with a static language, so for each data 
type supported, we need to cast the data pointer to the right type, then 
  have a code to convert it to the type needed by wx. It's not a big 
deal, but I'd rather keep it simple. I do want to support at least 
doubles and  ints. Users can use Numeric's astype() method to convert if 
need be.

I've noticed that there is a wxRealPoint class that uses doubles, but it 
doesn't look like it can be used as input to any of the wxDC methods. 
Too bad.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From xscottg at yahoo.com  Thu Apr  7 14:13:32 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 14:13:32 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050407211227.82679.qmail@web50206.mail.yahoo.com>

--- Chris Barker <Chris.Barker at noaa.gov> wrote:
> 
> Again, I'm uncomfortable with something that I have to check being 
> optional. If it is, we're encouraging people to not check it, and that' 
> a recipe for bugs later on down the road.
> 
[snip]
>
> > I guess all I'm saying is that I wouldn't assume the offset is zero...
> 
> Good point. All the more reason to have the offset be mandatory.
>

Lot's of protocols have optional parts.

The helper functions would hide this level of detail.


> 
> Yes, if there is a C/C++ version of all these helper functions, I'll be 
> a lot happier. And you're right, the same information should not be 
> encoded in two places, so my "iscontiguous" attribute should be a helper 
> function or maybe a method.
> 
>  > In a short while, you shouldn't have to check any __array_metadata__
>  > attributes directly.  There should even be a helper function for
>  > getting the array elements.
> 
> Cool. How would that work? A C++ iterator? I"m thinking not, as this is 
> all C, no?
> 

I think this will take shape as an include file with static/inline
functions.  No linking required, just #include <ndarray.h> and call the
functions.  It would be nice but not necessary that this was distributed
with Python.

I would be in favor of having some C++ iterator interfaces (possibly a
template class) inside of a #ifdef __cplusplus block.  Python doesn't seem
to have a a lot C++ in the core so I wonder if this would meet resistance
(even when it's inside of a #ifdef block).


>
>  > It wouldn't be a horrible mistake to have all the attributes be 
>  > mandatory, but it doesn't get array consumes any benefit that they
>  > can't get from a well written helper library, and it does add some
>  > burden to array producers.
> 
> Hardly any. I'm assuming that there will be a base_array class that can 
> be used as a base class or mixin, so it wouldn't be any work at all to 
> have a full set of attributes with defaults. It would take up a little 
> bit of memory. I'm assuming that the whole point of this is to support 
> large datasets, but maybe that isn't a valid assumption, After all, 
> small array support has turned out to be very important for Numeric.
> 

If the protocol can make things easy without the use of a mixin or base
class, all the better to my way of thinking.  I don't think the memory use
is very relevant as the attributes would only require storage in the class
object, not the instances.

There is something elegant about making array creation as easy as:

    class easy_array:
        def __init__(self, filename):
            data = open(filename, 'r').read()
            self.__array_data__ = data
            self.__array_shape__ = (len(data)/4,)
            self.__array_typestr__ = '>i4'


Like I said, I don't think it would be *horrible* to require all the
attributes, but I don't see how it will benefit you at all.  And even if
all the attributes are mandatory, there are still a number of details to
get right in reading the memory.  You'll likely want to use the helper
libraries/modules regardless.  (Once they're completed of course...)


>
> As a rule of thumb, I think there will be [more] consumers of arrays
> than producers, so I'd rather make it easy on the consumers that the 
> producers, if we need to make such a trade off. Maybe I'm biased, 
> because I'm a consumer.
>

I don't see the trade off.  It will be easy for you either way, but harder
for array producers (admittedly only a little).

This has to be easier than the situation you have today right?  Imagine the
code you'd have to write to special case Numeric, scipy.base, Numarray, and
Python's array module.


Cheers,
    -Scott


From tim.hochberg at cox.net  Thu Apr  7 14:31:11 2005
From: tim.hochberg at cox.net (Tim Hochberg)
Date: Thu Apr  7 14:31:11 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050407211227.82679.qmail@web50206.mail.yahoo.com>
References: <20050407211227.82679.qmail@web50206.mail.yahoo.com>
Message-ID: <4255A635.9010309@cox.net>

Scott Gilbert wrote:

>--- Chris Barker <Chris.Barker at noaa.gov> wrote:
>  
>
[SNIP]

>
>>As a rule of thumb, I think there will be [more] consumers of arrays
>>than producers, so I'd rather make it easy on the consumers that the 
>>producers, if we need to make such a trade off. Maybe I'm biased, 
>>because I'm a consumer.
>>
>>    
>>
>
>I don't see the trade off.  It will be easy for you either way, but harder
>for array producers (admittedly only a little).
>  
>
I think there is a trade off, but not the one that Chris is worried 
about. It should be easy to hide complexity of dealing with missing 
attributes through the various helper functions. The cost will be in 
speed and will probably be most noticable in C extensions using small 
arrays where the extra code to check if an attribute is present will be 
signifigant.

How signifigant this will be, I'm not sure. And frankly I don't care all 
that much since I generally only use large arrays. However, since one of 
the big faultlines between Numarray and Numeric involves the former's 
relatively poor small array performance, I suspect someone might care.

-tim

>This has to be easier than the situation you have today right?  Imagine the
>code you'd have to write to special case Numeric, scipy.base, Numarray, and
>Python's array module.
>
>  
>


From oliphant at ee.byu.edu  Thu Apr  7 15:47:04 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Thu Apr  7 15:47:04 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050407211501.60155.qmail@web50203.mail.yahoo.com>
References: <20050407211501.60155.qmail@web50203.mail.yahoo.com>
Message-ID: <4255B7D6.9000109@ee.byu.edu>

Scott Gilbert wrote:

>I agree, we need a road map of some sort.  It could be multiple PEPs
>depending, but it should include most of the following:
>
>    - Get the bytes object submitted.  There are only a few small
>      things in PEP 296 that should be changed.
>
>  
>
#4

>      - I'm not particularly interested in implementing the new bytes
>        literal and other features discussed in PEP 332, but it is
>        related to this topic.  (The proposal is for b"xxxxxx" to be a
>        bytes literal.)  We should make note that while this is not
>        part of the numpy roadmap, nothing prohibits that from being
>        implemented by another user.
>  
>
>    - Add an ndarray module.  This module will contain the ndarray
>      object as well as a superset of your helper functions.  I
>      think implementing it in pure Python on top of the bytes
>      object is the right course.  It's partly for documentation.
>
>    - Add an include file to make this protocol easily accessible
>      from C.  It's not much code, and the entire thing could be
>      done with inline/static functions in the .h file.  It would
>      be nice if this went into Python too, but not strictly
>      required.
>  
>
I put these together at #1

>    - Add the array protocol attributes to the existing array
>      object.
>  
>
#2

>    - Flesh out the "locked buffer" stuff in PEP 298.  Add support
>      for locking the buffer to the existing array object, the
>      bytes object, the mmap object, and anything else (string?)
>      that doesn't meet too much resistance.
>  
>
#3

>    - Fix the existing buffer object to regrab it's pointer
>      every time it's needed.  Could also add support to use
>      the "locked buffer" interface where possible.  I gather
>      that you are using this particular object in scipy.base
>      (is that true??).  Several shortcomings of it could be
>      easily fixed at the Python level, but I don't feel
>      strongly that this would have to be done...  Then again
>      it isn't much work.
>  
>
#5

I can't think of anything you've missed.

I'm very supportive of this, but I have to finish scipy.base first.   I 
think Perry is supportive as well.  I know he's been playing catch-up in 
the reading.   I'm not sure of Todd's opinion.   I suspect he would 
welcome these changes to Python.

My preference order is

1) the ndarray module and ndarray.h  header with these interface 
definitions and methods. 
2) Add array interface attributes to array module
3) Flesh out locked buffer API
4) Bytes object (with Pickling support)
5) Fix current buffer object.

-Travis


From strawman at astraw.com  Thu Apr  7 15:56:03 2005
From: strawman at astraw.com (Andrew Straw)
Date: Thu Apr  7 15:56:03 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4255502D.6060306@ee.byu.edu>
References: <20050407115141.96479.qmail@web50204.mail.yahoo.com> <4255502D.6060306@ee.byu.edu>
Message-ID: <4255BA56.7000001@astraw.com>

Travis Oliphant wrote:

> Scott Gilbert wrote:
>
>> --- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
>>  
>>
>>>> Good point, but a pain. Maybe they should be required, that way I
>>>> don't have to first check for the presence of '<' or '>', then check
>>>> if they have the right value.
>>>>     
>>>
>>> I'll second this. Pulling out more Python Zen: Explicit is better than
>>> implicit.
>>>
>>>   
>>
>>
>> I'll third.
>>  
>>
>
> O.K.  It's done....
>
Here's a bit of weirdness which has prevented me from using '<' or '>' 
in the past with the struct module.  I'm not guru enough to know what's 
going on, but it has prevented me from being explicit rather than implicit.

In [1]:import struct

In [2]:from numarray.ieeespecial import nan

In [3]:nan
Out[3]:nan

In [4]:struct.pack('<d',nan)
---------------------------------------------------------------------------
exceptions.SystemError                               Traceback (most 
recent call last)

/home/astraw/<console>

SystemError: frexp() result out of range

In [5]:struct.pack('d',nan)
Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff'


From Chris.Barker at noaa.gov  Thu Apr  7 16:01:03 2005
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Thu Apr  7 16:01:03 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4255A635.9010309@cox.net>
References: <20050407211227.82679.qmail@web50206.mail.yahoo.com> <4255A635.9010309@cox.net>
Message-ID: <4255BA80.4090201@noaa.gov>

Tim Hochberg wrote:
> Scott Gilbert wrote:
>> --- Chris Barker <Chris.Barker at noaa.gov> wrote:

>> I don't see the trade off.

I wasn't sure it applied in this case, but if there were a trade off, we 
should make things easiest for the consumers of arrays.

> I think there is a trade off, but not the one that Chris is worried 
> about. It should be easy to hide complexity of dealing with missing 
> attributes through the various helper functions. The cost will be in 
> speed and will probably be most noticable in C extensions using small 
> arrays where the extra code to check if an attribute is present will be 
> signifigant.

Actually, that is one I'm worried about. You're quite right, if I'm 
dealing with a 2X2 array, those helper functions are going to take much 
longer to run than accessing (and maybe using) the data. Like Tim, I'm 
mostly interested in using this for large data sets, but I think the 
small array thing might crop up unexpectedly. For example, with the 
current numarray, if you pass in an NX2 array to wxPython (to draw a 
polygon, for instance), it's very slow. It turns out that that's because 
a whole set of (2,) arrays are created when extracting the data, so even 
though you're dealing with a large data set, you end up dealing with a 
LOT of small arrays. Of course, the whole point of this is to avoid 
that, but I don't think we should assume that any overhead is negligible.

> 
>> This has to be easier than the situation you have today right?  
well, sure. Though it seems to be harder than using the Numeric API. 
Directly. However, I'll shut up now, as it seems that the proposed 
utility functions will address my issues.

-Chris

PS to Tim: Want to help out with the wxPython integration?


-- 
Christopher Barker, Ph.D.
Oceanographer
                                     		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From xscottg at yahoo.com  Thu Apr  7 20:05:48 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 20:05:48 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050408030336.54970.qmail@web50209.mail.yahoo.com>

--- Andrew Straw <strawman at astraw.com> wrote:
>
> Here's a bit of weirdness which has prevented me from using '<' or '>' 
> in the past with the struct module.  I'm not guru enough to know what's 
> going on, but it has prevented me from being explicit rather than
> implicit.
> 
> In [1]:import struct
> 
> In [2]:from numarray.ieeespecial import nan
> 
> In [3]:nan
> Out[3]:nan
> 
> In [4]:struct.pack('<d',nan)
>
---------------------------------------------------------------------------
> exceptions.SystemError                               Traceback (most 
> recent call last)
> 
> /home/astraw/<console>
> 
> SystemError: frexp() result out of range
> 
> In [5]:struct.pack('d',nan)
> Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff'
> 


No clue why that is, but it certainly looks like a bug in the struct
module.  It shouldn't make any difference about whether or not the array
protocol reports the endian though.  It's using a different notation for
typecodes.


Cheers,
    -Scott


From rkern at ucsd.edu  Thu Apr  7 20:24:38 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Thu Apr  7 20:24:38 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050408030336.54970.qmail@web50209.mail.yahoo.com>
References: <20050408030336.54970.qmail@web50209.mail.yahoo.com>
Message-ID: <4255F79D.4000501@ucsd.edu>

Scott Gilbert wrote:
> --- Andrew Straw <strawman at astraw.com> wrote:
> 
>>Here's a bit of weirdness which has prevented me from using '<' or '>' 
>>in the past with the struct module.  I'm not guru enough to know what's 
>>going on, but it has prevented me from being explicit rather than
>>implicit.
>>
>>In [1]:import struct
>>
>>In [2]:from numarray.ieeespecial import nan
>>
>>In [3]:nan
>>Out[3]:nan
>>
>>In [4]:struct.pack('<d',nan)
>>
> 
> ---------------------------------------------------------------------------
> 
>>exceptions.SystemError                               Traceback (most 
>>recent call last)
>>
>>/home/astraw/<console>
>>
>>SystemError: frexp() result out of range
>>
>>In [5]:struct.pack('d',nan)
>>Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff'
>>
> 
> 
> 
> No clue why that is, but it certainly looks like a bug in the struct
> module.  It shouldn't make any difference about whether or not the array
> protocol reports the endian though.  It's using a different notation for
> typecodes.

This behavior is expplained by Tim Peters:

http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From xscottg at yahoo.com  Thu Apr  7 21:07:02 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 21:07:02 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050408040601.86838.qmail@web50203.mail.yahoo.com>

--- Tim Hochberg <tim.hochberg at cox.net> wrote:
>
> I think there is a trade off, but not the one that Chris is worried 
> about. It should be easy to hide complexity of dealing with missing 
> attributes through the various helper functions. The cost will be in 
> speed and will probably be most noticable in C extensions using small 
> arrays where the extra code to check if an attribute is present will be 
> signifigant.
> 
> How signifigant this will be, I'm not sure. And frankly I don't care all 
> that much since I generally only use large arrays. However, since one of 
> the big faultlines between Numarray and Numeric involves the former's 
> relatively poor small array performance, I suspect someone might care.
> 

You must check the return value of the PyObject_GetAttr (or
PyObject_GetAttrString) calls regardless.  Otherwise the extension will die
with an ugly segfault the first time one passes an float where an array was
expected.

If we're talking about small light-weight arrays and a C/C++ function that
wants to work with them very efficiently, I'm not convinced that requiring
the attributes be present will make things faster.


As we're talking about small light weight arrays, it's unlikely the
individual arrays will have __array_shape__ or __array_strides__ already
stored as tuples.  They'll probably store them as a C array as part of
their PyObject structure.


In the world where some of these attributes are optional:  If an attribute
like __array_offset__ or __array_shape__ isn't present, the C code will
know to use zero or the default C-contiguous layout.  So the check failed,
but the failure case is probably very fast (since a temporary tuple object
doesn't have to be built by the array on the fly).


In the world where all of the attributes are required:  The array object
will have to generate the __array_offset__ int/long or __array_shape___
tuple from it's own internal representation.  Then the C/C++ consumer code
will bust apart the tuple to get the values.  So the check succeeded, but
the success code needs to grab the parts of the tuple.


The C helper code could look like:

    struct PyNDArrayInfo {
        int ndims;
        int endian;
        char itemcode;
        size_t itemsize;
        Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */
        Py_LONG_LONG offset;
        Py_LONG_LONG strides[40];
        /* More Array Info goes here */
    };

    int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) {
        PyObject* shape;
        PyObject* offset;
        PyObject* strides;
        int ii, len;

        info->itemsize = too_long_for_this_example(obj);

        shape = PyObject_GetAttrString(obj, "__array_shape__");
        if (!shape) return 0;
        len = PySequence_Size(shape);
        if (len < 0) return 0;
        if (len > 40) return 0; /* This needs work */
        info->ndims = len;
        for (ii = 0; ii<len; ii++) {
            PyObject* val = PySequence_GetItem(shape, ii);
            info->shape[ii] = PyLong_AsLongLong(val);
            Py_DECREF(val);
        }
        Py_DECREF(shape);

        offset = PyObject_GetAttrString(obj, "__array_offset__");
        if (offset) {
            /*** THIS PART MIGHT BE SLOWER WHEN IT SUCCEEDS ***/
            info->offset = PyLong_AsLongLong(offset);
            Py_DECREF(offset);
        } else {
            PyErr_Clear();
            info->offset = 0;
        }

        strides = PyObject_GetAttrString(obj, "__array_strides__");
        if (strides) {
            /*** THIS PART IS ALMOST CERTAINLY SLOWER ***/
            for (ii = 0; ii<ndims; ii++) {
                PyObject* val = PySequence_GetItem(strides, ii);
                info->strides[ii] = PyLong_AsLongLong(val);
                Py_DECREF(val);
            }
            Py_DECREF(strides);
        } else {
            /*** THIS FAILURE PATH IS PROBABLY FASTER ***/
            size_t size = info->size;
            PyErr_Clear();
            for (ii = ndims-1; ii>=0; ii--) {
                info->strides[ii] = size;
                size *= info->shape[ii];
            }
        }

        /* More code goes here */
    }


I have no idea how expensive PyErr_Clear() is.  We'd have to profile it to
see for certain.  If PyErr_Clear() is not expensive, then we could make a
strong argument that *not* requiring the attributes will be more efficient.
 

It could also be so close that it doesn't matter - in which case it's back
to being a matter of taste...


Cheers,
    -Scott


From xscottg at yahoo.com  Thu Apr  7 21:16:06 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr  7 21:16:06 2005
Subject: [Numpy-discussion] Questions about the array interface.
Message-ID: <20050408041417.61390.qmail@web50210.mail.yahoo.com>

Oops, sent too fast.  Quick correction...
 
> 
> In the world where some of these attributes are optional:  If an
> attribute like __array_offset__ or __array_shape__ isn't present,
> the C code will know to use zero or the default C-contiguous layout.
> So the check failed, but the failure case is probably very fast 
> (since a temporary tuple object doesn't have to be built by the array
> on the fly).
> 

I meant to say "__array_offset__ or __array_stides___".  The
__array_shape__ attribute would always be required for arrays...


Cheers,
    -Scott


From tim.hochberg at cox.net  Thu Apr  7 23:56:10 2005
From: tim.hochberg at cox.net (Tim Hochberg)
Date: Thu Apr  7 23:56:10 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050408040601.86838.qmail@web50203.mail.yahoo.com>
References: <20050408040601.86838.qmail@web50203.mail.yahoo.com>
Message-ID: <42562AC5.3040502@cox.net>

Scott Gilbert wrote:

>--- Tim Hochberg <tim.hochberg at cox.net> wrote:
>  
>
>>I think there is a trade off, but not the one that Chris is worried 
>>about. It should be easy to hide complexity of dealing with missing 
>>attributes through the various helper functions. The cost will be in 
>>speed and will probably be most noticable in C extensions using small 
>>arrays where the extra code to check if an attribute is present will be 
>>signifigant.
>>
>>How signifigant this will be, I'm not sure. And frankly I don't care all 
>>that much since I generally only use large arrays. However, since one of 
>>the big faultlines between Numarray and Numeric involves the former's 
>>relatively poor small array performance, I suspect someone might care.
>>
>>    
>>
>
>You must check the return value of the PyObject_GetAttr (or
>PyObject_GetAttrString) calls regardless.  Otherwise the extension will die
>with an ugly segfault the first time one passes an float where an array was
>expected.
>
>If we're talking about small light-weight arrays and a C/C++ function that
>wants to work with them very efficiently, I'm not convinced that requiring
>the attributes be present will make things faster.
>
>
>As we're talking about small light weight arrays, it's unlikely the
>individual arrays will have __array_shape__ or __array_strides__ already
>stored as tuples.  They'll probably store them as a C array as part of
>their PyObject structure.
>
>
>In the world where some of these attributes are optional:  If an attribute
>like __array_offset__ or __array_shape__ isn't present, the C code will
>know to use zero or the default C-contiguous layout.  So the check failed,
>but the failure case is probably very fast (since a temporary tuple object
>doesn't have to be built by the array on the fly).
>
>
>In the world where all of the attributes are required:  The array object
>will have to generate the __array_offset__ int/long or __array_shape___
>tuple from it's own internal representation.  Then the C/C++ consumer code
>will bust apart the tuple to get the values.  So the check succeeded, but
>the success code needs to grab the parts of the tuple.
>
>
>
>The C helper code could look like:
>  
>

I'm not convinced it's legit to assume that a failure to get the 
attribute means that it's not present and call PyErrorClear. Just as a 
for instance, what if the attribute in question is implemented as a 
descriptor in which there is some internal error. Then your burying the 
error and most likely doing the wrong thing. As far as I can tell, the 
only correct way to do this is to use PyObject_HasAttrString, then 
PyObject_GetAttrString if that succeeds.

The point about not passing around the tuples probably being faster is a 
good one. Another thought is that requiring tuples instead of general 
sequences would make the helper faster (since one could use 
*PyTuple_GET_**ITEM*, which I believe is much faster than 
PySequence_GetItem). This would possibly shift more pain onto the 
implementer of the object though. I suspect that the best strategy, 
orthogonal to requiring all attributes or not, is to use PySequence_Fast 
to get a fast sequence and work with that. This means that objects that 
return tuples for strides, etc would run at maximum possible speed, 
while other sequences would still work.

Back to requiring attributes or not. I suspect that the fastest correct 
way is to require all attributes, but allow them to be None, in which 
case the default value is used. Then any errors are easily bubbled up 
and a fast check for None choses whether to use the defaults or not.

It's late, so I hope that's not too incoherent. Or too wrong.

Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort 
of error checking (it can allegedly raise errors). I suppose that means 
one is supposed to call PyError_Occurred after every call? That's sort 
of painful!

-tim
**

>    struct PyNDArrayInfo {
>        int ndims;
>        int endian;
>        char itemcode;
>        size_t itemsize;
>        Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */
>        Py_LONG_LONG offset;
>        Py_LONG_LONG strides[40];
>        /* More Array Info goes here */
>    };
>
>    int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) {
>        PyObject* shape;
>        PyObject* offset;
>        PyObject* strides;
>        int ii, len;
>
>        info->itemsize = too_long_for_this_example(obj);
>
>        shape = PyObject_GetAttrString(obj, "__array_shape__");
>        if (!shape) return 0;
>        len = PySequence_Size(shape);
>        if (len < 0) return 0;
>        if (len > 40) return 0; /* This needs work */
>        info->ndims = len;
>        for (ii = 0; ii<len; ii++) {
>            PyObject* val = PySequence_GetItem(shape, ii);
>            info->shape[ii] = PyLong_AsLongLong(val);
>            Py_DECREF(val);
>        }
>        Py_DECREF(shape);
>
>        offset = PyObject_GetAttrString(obj, "__array_offset__");
>        if (offset) {
>            /*** THIS PART MIGHT BE SLOWER WHEN IT SUCCEEDS ***/
>            info->offset = PyLong_AsLongLong(offset);
>            Py_DECREF(offset);
>        } else {
>            PyErr_Clear();
>            info->offset = 0;
>        }
>
>        strides = PyObject_GetAttrString(obj, "__array_strides__");
>        if (strides) {
>            /*** THIS PART IS ALMOST CERTAINLY SLOWER ***/
>            for (ii = 0; ii<ndims; ii++) {
>                PyObject* val = PySequence_GetItem(strides, ii);
>                info->strides[ii] = PyLong_AsLongLong(val);
>                Py_DECREF(val);
>            }
>            Py_DECREF(strides);
>        } else {
>            /*** THIS FAILURE PATH IS PROBABLY FASTER ***/
>            size_t size = info->size;
>            PyErr_Clear();
>            for (ii = ndims-1; ii>=0; ii--) {
>                info->strides[ii] = size;
>                size *= info->shape[ii];
>            }
>        }
>
>        /* More code goes here */
>    }
>
>
>
>I have no idea how expensive PyErr_Clear() is.  We'd have to profile it to
>see for certain.  If PyErr_Clear() is not expensive, then we could make a
>strong argument that *not* requiring the attributes will be more efficient.
> 
>
>It could also be so close that it doesn't matter - in which case it's back
>to being a matter of taste...
>
>
>Cheers,
>    -Scott
>
>
>
>
>
>  
>


From cookedm at physics.mcmaster.ca  Fri Apr  8 00:43:08 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Fri Apr  8 00:43:08 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <42562AC5.3040502@cox.net>
References: <20050408040601.86838.qmail@web50203.mail.yahoo.com> <42562AC5.3040502@cox.net>
Message-ID: <20050408074129.GA16479@arbutus.physics.mcmaster.ca>

On Thu, Apr 07, 2005 at 11:55:01PM -0700, Tim Hochberg wrote:
> Scott Gilbert wrote:
> 
> >--- Tim Hochberg <tim.hochberg at cox.net> wrote:
> > 
> >
> >>I think there is a trade off, but not the one that Chris is worried 
> >>about. It should be easy to hide complexity of dealing with missing 
> >>attributes through the various helper functions. The cost will be in 
> >>speed and will probably be most noticable in C extensions using small 
> >>arrays where the extra code to check if an attribute is present will be 
> >>signifigant.
> >>
> >>How signifigant this will be, I'm not sure. And frankly I don't care all 
> >>that much since I generally only use large arrays. However, since one of 
> >>the big faultlines between Numarray and Numeric involves the former's 
> >>relatively poor small array performance, I suspect someone might care.
> >>
> >>   
> >>
> >
> >You must check the return value of the PyObject_GetAttr (or
> >PyObject_GetAttrString) calls regardless.  Otherwise the extension will die
> >with an ugly segfault the first time one passes an float where an array was
> >expected.
> >
> >If we're talking about small light-weight arrays and a C/C++ function that
> >wants to work with them very efficiently, I'm not convinced that requiring
> >the attributes be present will make things faster.
> >
> >
> >As we're talking about small light weight arrays, it's unlikely the
> >individual arrays will have __array_shape__ or __array_strides__ already
> >stored as tuples.  They'll probably store them as a C array as part of
> >their PyObject structure.
> >
> >
> >In the world where some of these attributes are optional:  If an attribute
> >like __array_offset__ or __array_shape__ isn't present, the C code will
> >know to use zero or the default C-contiguous layout.  So the check failed,
> >but the failure case is probably very fast (since a temporary tuple object
> >doesn't have to be built by the array on the fly).
> >
> >In the world where all of the attributes are required:  The array object
> >will have to generate the __array_offset__ int/long or __array_shape___
> >tuple from it's own internal representation.  Then the C/C++ consumer code
> >will bust apart the tuple to get the values.  So the check succeeded, but
> >the success code needs to grab the parts of the tuple.
> >
> >The C helper code could look like:
> 
> I'm not convinced it's legit to assume that a failure to get the 
> attribute means that it's not present and call PyErrorClear. Just as a 
> for instance, what if the attribute in question is implemented as a 
> descriptor in which there is some internal error. Then your burying the 
> error and most likely doing the wrong thing. As far as I can tell, the 
> only correct way to do this is to use PyObject_HasAttrString, then 
> PyObject_GetAttrString if that succeeds.

No point: PyObject_HasAttrString *calls* PyObject_GetAttrString, then
clears the error if there is one.

[Side note: hasattr() in Python works the same way, which is why using
properties is a pain when you've got code that's using it]

> The point about not passing around the tuples probably being faster is a 
> good one. Another thought is that requiring tuples instead of general 
> sequences would make the helper faster (since one could use 
> *PyTuple_GET_**ITEM*, which I believe is much faster than 
> PySequence_GetItem). This would possibly shift more pain onto the 
> implementer of the object though. I suspect that the best strategy, 
> orthogonal to requiring all attributes or not, is to use PySequence_Fast 
> to get a fast sequence and work with that. This means that objects that 
> return tuples for strides, etc would run at maximum possible speed, 
> while other sequences would still work.

How about objects that use a lightweight array as the strides sequence?
I'm thinking that if you've got a fast 1-d array object, you'd be
tempted to use an instance of that as the shape or strides attribute.
You'd be saving on temporary tuple creation (but you'd still be losing
some in making Python ints).

I haven't benchmarked it, but I'm looking at the code for
PySequence_GetItem(): it does a few pointer derefences to get the
sq_item() method in the tp_as_sequence struct of an object implementing
the sequence protocol, which for the tuple does an array indexing of the
tuple's data. You've got about two function calls more compared
to using PyTuple_GET_ITEM.

It really depends on how big the arrays you expect to get passed to you.
If they're big, this is all amortized: you'll hardly see it.
It also depends on how your routines get used. If the routine is buried
below a few layers of API, you'd likely be better off doing a typecast
higher up to your own representation, or something. If it's at the
border, so the user will call it directly *often*, you're going to be
screwed for speed anyways (giving the user the option of casting arrays
to something else would probably help a lot here also).

> Back to requiring attributes or not. I suspect that the fastest correct 
> way is to require all attributes, but allow them to be None, in which 
> case the default value is used. Then any errors are easily bubbled up 
> and a fast check for None choses whether to use the defaults or not.
> 
> It's late, so I hope that's not too incoherent. Or too wrong.
> 
> Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort 
> of error checking (it can allegedly raise errors). I suppose that means 
> one is supposed to call PyError_Occurred after every call? That's sort 
> of painful!

Yes! Check all C API functions that may return errors! That includes
PySequence_GetItem() and PyLong_AsLongLong.

> >   struct PyNDArrayInfo {
> >       int ndims;
> >       int endian;
> >       char itemcode;
> >       size_t itemsize;
> >       Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */
> >       Py_LONG_LONG offset;
> >       Py_LONG_LONG strides[40];
> >       /* More Array Info goes here */
> >   };
> >
> >   int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) {
> >       PyObject* shape;
> >       PyObject* offset;
> >       PyObject* strides;
> >       int ii, len;
> >
> >       info->itemsize = too_long_for_this_example(obj);
> >
> >       shape = PyObject_GetAttrString(obj, "__array_shape__");
> >       if (!shape) return 0;
> >       len = PySequence_Size(shape);
> >       if (len < 0) return 0;
> >       if (len > 40) return 0; /* This needs work */
> >       info->ndims = len;
> >       for (ii = 0; ii<len; ii++) {
> >           PyObject* val = PySequence_GetItem(shape, ii);

Like here
> >           info->shape[ii] = PyLong_AsLongLong(val);
and here
> >           Py_DECREF(val);
(if you don't check PySequence_GetItem -- not a good idea anyways --
this should be Py_XDECREF)

[snip more code that needs checks :-)]

> >I have no idea how expensive PyErr_Clear() is.  We'd have to profile it to
> >see for certain.  If PyErr_Clear() is not expensive, then we could make a
> >strong argument that *not* requiring the attributes will be more efficient.

Not much; it's about three Py_XDECREF's.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From cookedm at physics.mcmaster.ca  Fri Apr  8 01:22:09 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Fri Apr  8 01:22:09 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
Message-ID: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>

It seems that people are worried about speed of the attribute-based
array interface when using small arrays in C.

Here's an alternative: Define some attribute (for now, call it
__array_c__), which returns a CObject whose value (which you get with
PyCObject_GetVoidPtr) would be a pointer to a struct describing the
array. It would look something like

typedef struct {
    int version;
    int nd;
    Py_LONG_LONG *shape;
    char typecode;
    Py_LONG_LONG *strides;
    Py_LONG_LONG offset;
    void *data;
} SimpleCArray;

(The order here follows that of the array interface spec; if somebody's
got any comments on what mixing int's, Py_LONG_LONG, and char's in a
struct does to the packing and potential alignment problems I'd like to
know.)

version is there as a sanity check: I'd say for this version it's
something like 0xDECAF ('cause it's lightweight, see ;-). It's primarily
a check that you've got the right thing (sinc CObjects are
intrinsically opaque types).

Then:
- the array object guarantees that the data, etc. remains alive,
  probably by passing itself as the desc parameter to the CObject.
  The array data would have to stay at the same location and the same
  size while the reference is held.

- typecode follows that of the __array_typestr__ attribute

- shape and strides are pointers to arrays of at least nd elements.

- this doesn't handle byteswapped as-is. Maybe a flags, or endian,
  attribute could be added.

- you can still have the full attribute-base array interface
  (__array_strides__, etc.) to fall back on. If the typecode is 'V',
  you'll have to look at __array_descr__.

Creating one from a Numeric PyArrayObject would go like this:

PyObject *create_SimpleCArray(PyArrayObject *a)
{
    SimpleCArray *ca = PyMem_New(SimpleCArray, 1);
    ca->version = 0xDECAF;
    ca->nd = a->nd;
    ca->shape = PyMem_New(Py_LONG_LONG, ca->nd);
    for (i = 0; i < ca->nd; i++) {
        ca->shape[i] = a->dimensions[i];
    }
    ca->strides = PyMem_New(Py_LONG_LONG, ca->nd);
    for (i = 0; i < ca->nd; i++) {
        ca->strides[i] = a->strides[i];
    }
    ca->offset = 0;
    ca->data = &my_data;

    Py_INCREF(a);
    PyObject *co = PyCObject_FromVoidPtrAndDesc(ca, a, free_numeric_simplecarray);
    return co;
}

where
void free_numeric_simplecarray(SimpleCArray *ca, PyArrayObject *a)
{
    PyMem_Free(ca->shape);
    PyMem_Free(ca->strides);
    PyMem_Free(ca);
    Py_DECREF(a);
}

Some points:
- you have to keep the CObject around: destroying it will potentially
  destroy the array you're looking at.
- I was thinking that maybe adding a PyObject *owner could make it
  easier to keep track of the owner; I'm not sure, as the descr argument
  in CObjects can easily play that role.
- The creator of the SimpleCArray is free to add elements to the end
  (as long as they don't affect the padding/alignment of the previous
  ones: haven't thought about this). You could put the real owner of the
  array data there, for example (say, if it was wrapping a Blitz++
  array). Or have a small _strides[30] array at the end, and strides
  would point to that (saving you a memory allocation).

This simple C interface would, I think, alleviate much worries about
speed for small arrays, and even for large arrays.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From curzio.basso at unibas.ch  Fri Apr  8 06:30:05 2005
From: curzio.basso at unibas.ch (Curzio Basso)
Date: Fri Apr  8 06:30:05 2005
Subject: [Numpy-discussion] profile reveals calls to astype()
In-Reply-To: <1112896207.2437.34.camel@halloween.stsci.edu>
References: <4255664F.2070107@unibas.ch> <1112896207.2437.34.camel@halloween.stsci.edu>
Message-ID: <4256873B.2060501@unibas.ch>

Todd Miller wrote:

> astype() is used in a bunch of places, including the C-API,  so it's
> hard to guess how it's getting called with the information here.  In

ok, so probably C functions are somehow 'transparent' to the profiler which does not report them,
but reports the python functions called by the C one...

>>>>from yourmodule import newfoo  # you redefined foo to accept N as a parameter
>>>>import pdb
>>>>pdb.run("newfoo(N=2)")
> 
> (pdb) s  # step along a little to get into newfoo()
> ... step output
> (pdb) import numarray.numarraycore as nc
> (pdb) break nc.astype

strange, what I get now is:

> (Pdb) b nc.astype
> *** The specified object 'nc.astype' is not a function
> or was not found along sys.path.

and in fact if I look at nc.__dict__ there is no 'astype' key. I'm running the whole program (rather
than just the function) under ipython, starting it with

> %run -d myprog.py

maybe this could mess up things?

curzio


From jmiller at stsci.edu  Fri Apr  8 06:45:13 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Fri Apr  8 06:45:13 2005
Subject: [Numpy-discussion] profile reveals calls to astype()
In-Reply-To: <4256873B.2060501@unibas.ch>
References: <4255664F.2070107@unibas.ch>
	 <1112896207.2437.34.camel@halloween.stsci.edu> <4256873B.2060501@unibas.ch>
Message-ID: <1112967803.5142.29.camel@halloween.stsci.edu>

On Fri, 2005-04-08 at 09:29, Curzio Basso wrote:
> Todd Miller wrote:
> 
> > astype() is used in a bunch of places, including the C-API,  so it's
> > hard to guess how it's getting called with the information here.  In
> 
> ok, so probably C functions are somehow 'transparent' to the profiler which does not report them,
> but reports the python functions called by the C one...
> 
> >>>>from yourmodule import newfoo  # you redefined foo to accept N as a parameter
> >>>>import pdb
> >>>>pdb.run("newfoo(N=2)")
> > 
> > (pdb) s  # step along a little to get into newfoo()
> > ... step output
> > (pdb) import numarray.numarraycore as nc
> > (pdb) break nc.astype
> 
> strange, what I get now is:
> 
> > (Pdb) b nc.astype
> > *** The specified object 'nc.astype' is not a function
> > or was not found along sys.path.
> 
> and in fact if I look at nc.__dict__ there is no 'astype' key. I'm running the whole program (rather
> than just the function) under ipython, starting it with
> 
> > %run -d myprog.py
> 
> maybe this could mess up things?

No.  I should have said "b nc.NumArray.astype".  I just tried this out
with an astype() callback from numarray.convolve's C-code and it worked
OK for me.

Regards,
Todd


From strawman at astraw.com  Fri Apr  8 08:00:13 2005
From: strawman at astraw.com (Andrew Straw)
Date: Fri Apr  8 08:00:13 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4255F79D.4000501@ucsd.edu>
References: <20050408030336.54970.qmail@web50209.mail.yahoo.com> <4255F79D.4000501@ucsd.edu>
Message-ID: <42569C4D.2080904@astraw.com>

Robert Kern wrote:

> Scott Gilbert wrote:
>
>> --- Andrew Straw <strawman at astraw.com> wrote:
>>
>>> Here's a bit of weirdness which has prevented me from using '<' or 
>>> '>' in the past with the struct module.  I'm not guru enough to know 
>>> what's going on, but it has prevented me from being explicit rather 
>>> than
>>> implicit.
>>>
>>> In [1]:import struct
>>>
>>> In [2]:from numarray.ieeespecial import nan
>>>
>>> In [3]:nan
>>> Out[3]:nan
>>>
>>> In [4]:struct.pack('<d',nan)
>>>
>>
>> --------------------------------------------------------------------------- 
>>
>>
>>> exceptions.SystemError                               Traceback (most 
>>> recent call last)
>>>
>>> /home/astraw/<console>
>>>
>>> SystemError: frexp() result out of range
>>>
>>> In [5]:struct.pack('d',nan)
>>> Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff'
>>>
>>
>>
>>
>> No clue why that is, but it certainly looks like a bug in the struct
>> module.  It shouldn't make any difference about whether or not the array
>> protocol reports the endian though.  It's using a different notation for
>> typecodes.
>
>
> This behavior is expplained by Tim Peters:
>
> http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a
>
I feared it was something like that. (No platform independent way to 
represent special values like nan, inf, and so on.)  So I think if we're 
going to require an encoding character such as '<' or '>' we should also 
include one that means native which CAN handle these special values... 
And document why it's needed and why it may get one into trouble.


From jmiller at stsci.edu  Fri Apr  8 10:14:04 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Fri Apr  8 10:14:04 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
In-Reply-To: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
Message-ID: <1112980431.5142.116.camel@halloween.stsci.edu>

On Fri, 2005-04-08 at 04:21, David M. Cooke wrote:
> It seems that people are worried about speed of the attribute-based
> array interface when using small arrays in C.

I was a little worried too,  but think the array protocol idea is a good
one in any case.  Thinking about this,  I'm wondering if what we used to
do in early numarray (0.2) wouldn't work here.  Our "consumer interface"
/ helper function looked more like this:

int getSimpleCArray(PyObject *o, SimpleCArray *info);

It basically just fills in the caller's SimpleCArray struct using
information from o and returns 0, or -1 with an exception set if there's
some problem.  In numarray's SimpleCArray struct,  the shape and strides
arrays were fully allocated (i.e. Py_LONG_LONG shape[MAXDIM];) so the
struct could be placed in an auto variable with nothing to free() later.

In this interface,  there is no implied getattr at all,  since the
helper function getSimpleCArray() can be made as smart (i.e. given
knowledge about specific types) as people are motivated to make it. 
So,  for a Numeric array or a numarray or a Numeric3 array, 
getSimpleCArray would presumably just copy from struct to struct,  but
for other types,  it might fall back on the many-getattr approach.

Regards,
Todd

> Here's an alternative: Define some attribute (for now, call it
> __array_c__), which returns a CObject whose value (which you get with
> PyCObject_GetVoidPtr) would be a pointer to a struct describing the
> array. It would look something like
> 
> typedef struct {
>     int version;
>     int nd;
>     Py_LONG_LONG *shape;
>     char typecode;
>     Py_LONG_LONG *strides;
>     Py_LONG_LONG offset;
>     void *data;
> } SimpleCArray;
> 
> (The order here follows that of the array interface spec; if somebody's
> got any comments on what mixing int's, Py_LONG_LONG, and char's in a
> struct does to the packing and potential alignment problems I'd like to
> know.)
> 
> version is there as a sanity check: I'd say for this version it's
> something like 0xDECAF ('cause it's lightweight, see ;-). It's primarily
> a check that you've got the right thing (sinc CObjects are
> intrinsically opaque types).
> 
> Then:
> - the array object guarantees that the data, etc. remains alive,
>   probably by passing itself as the desc parameter to the CObject.
>   The array data would have to stay at the same location and the same
>   size while the reference is held.
> 
> - typecode follows that of the __array_typestr__ attribute
> 
> - shape and strides are pointers to arrays of at least nd elements.
> 
> - this doesn't handle byteswapped as-is. Maybe a flags, or endian,
>   attribute could be added.
> 
> - you can still have the full attribute-base array interface
>   (__array_strides__, etc.) to fall back on. If the typecode is 'V',
>   you'll have to look at __array_descr__.
> 
> Creating one from a Numeric PyArrayObject would go like this:
> 
> PyObject *create_SimpleCArray(PyArrayObject *a)
> {
>     SimpleCArray *ca = PyMem_New(SimpleCArray, 1);
>     ca->version = 0xDECAF;
>     ca->nd = a->nd;
>     ca->shape = PyMem_New(Py_LONG_LONG, ca->nd);
>     for (i = 0; i < ca->nd; i++) {
>         ca->shape[i] = a->dimensions[i];
>     }
>     ca->strides = PyMem_New(Py_LONG_LONG, ca->nd);
>     for (i = 0; i < ca->nd; i++) {
>         ca->strides[i] = a->strides[i];
>     }
>     ca->offset = 0;
>     ca->data = &my_data;
> 
>     Py_INCREF(a);
>     PyObject *co = PyCObject_FromVoidPtrAndDesc(ca, a, free_numeric_simplecarray);
>     return co;
> }
> 
> where
> void free_numeric_simplecarray(SimpleCArray *ca, PyArrayObject *a)
> {
>     PyMem_Free(ca->shape);
>     PyMem_Free(ca->strides);
>     PyMem_Free(ca);
>     Py_DECREF(a);
> }
> 
> Some points:
> - you have to keep the CObject around: destroying it will potentially
>   destroy the array you're looking at.
> - I was thinking that maybe adding a PyObject *owner could make it
>   easier to keep track of the owner; I'm not sure, as the descr argument
>   in CObjects can easily play that role.
> - The creator of the SimpleCArray is free to add elements to the end
>   (as long as they don't affect the padding/alignment of the previous
>   ones: haven't thought about this). You could put the real owner of the
>   array data there, for example (say, if it was wrapping a Blitz++
>   array). Or have a small _strides[30] array at the end, and strides
>   would point to that (saving you a memory allocation).
> 
> This simple C interface would, I think, alleviate much worries about
> speed for small arrays, and even for large arrays.
-- 


From xscottg at yahoo.com  Fri Apr  8 11:06:04 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 11:06:04 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <42562AC5.3040502@cox.net>
Message-ID: <20050408180523.95022.qmail@web50207.mail.yahoo.com>

--- Tim Hochberg <tim.hochberg at cox.net> wrote:
> 
> The point about not passing around the tuples probably being faster is a 
> good one. Another thought is that requiring tuples instead of general 
> sequences would make the helper faster (since one could use 
> *PyTuple_GET_**ITEM*, which I believe is much faster than 
> PySequence_GetItem). This would possibly shift more pain onto the 
> implementer of the object though. I suspect that the best strategy, 
> orthogonal to requiring all attributes or not, is to use PySequence_Fast 
> to get a fast sequence and work with that. This means that objects that 
> return tuples for strides, etc would run at maximum possible speed, 
> while other sequences would still work.
> 

I hadn't seen this "fast" sequence stuff before.  Thanks for the pointer.

>
> Back to requiring attributes or not. I suspect that the fastest correct 
> way is to require all attributes, but allow them to be None, in which 
> case the default value is used. Then any errors are easily bubbled up 
> and a fast check for None choses whether to use the defaults or not.
> 

How about saying that, for all the optional attributes, if they return None
that's to be treated the same way as if they weren't present at all?  In
other words, they're still optional, but people in the know would know that
returning None was probably faster...


From xscottg at yahoo.com  Fri Apr  8 11:14:27 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 11:14:27 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050408074129.GA16479@arbutus.physics.mcmaster.ca>
Message-ID: <20050408181314.89274.qmail@web50205.mail.yahoo.com>

--- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
>
> > Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort

> > of error checking (it can allegedly raise errors). I suppose that means
> > one is supposed to call PyError_Occurred after every call? That's sort 
> > of painful!
> 
> Yes! Check all C API functions that may return errors! That includes
> PySequence_GetItem() and PyLong_AsLongLong.
> 

Sorry, I should have been clear that I was writing example code.  I only
put the error checking in where I thought it was demonstrating the point. 
I'd be surprized if it even compiled...

Note that the additional error checking is required in the "success" path
where the attributes are present.  In other words, mandating the attributes
be there when they aren't strictly required could make things slower...


Cheers,
    -Scott


From xscottg at yahoo.com  Fri Apr  8 12:24:02 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 12:24:02 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
In-Reply-To: 6667
Message-ID: <20050408192312.91215.qmail@web50206.mail.yahoo.com>

--- "David M. Cooke" <cookedm at physics.mcmaster.ca> wrote:
>
> It seems that people are worried about speed of the attribute-based
> array interface when using small arrays in C.
>

I'm really not worried about it...  I just don't want "performance" to be
used as an argument for a given design decisions when the proposed change
won't actually make things faster.


> 
> Here's an alternative: Define some attribute (for now, call it
> [snip]
>

This would definitely be faster.  Faster yet would be doing a
PyNumeric_Check (or PyNumarray_Check, or whatever they're called) and just
cast the pointer to the underlying representation.  If you must go fast, go
as fast as possible...

I'd rather we didn't add a lot complexity to the array protocol to just go
at a medium speed.


Cheers,
    -Scott


From oliphant at ee.byu.edu  Fri Apr  8 13:55:27 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr  8 13:55:27 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
In-Reply-To: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
Message-ID: <4256EF45.6070004@ee.byu.edu>

David M. Cooke wrote:

>It seems that people are worried about speed of the attribute-based
>array interface when using small arrays in C.
>  
>
I think we are talking about here an *array protocol*   (i.e. like the 
buffer protocol and sequence
protocol). 

So far we have just described the Python level interface.  I would like 
to see an array protocol added (perhaps to the buffer protocol table).  
This could be done just as David describes --- we don't even need to use 
the C-pointer (just return a void *pointer which has a version as the 
first entry).

I think this is how the C-level should be handled, I think.  Yes, it 
does not require changes to Python to implement the __array_c__  
attribute.  But, ultimately, it would be better if we used the C-level 
protocol concept that Python already uses for other objects.

-Travis


From perry at stsci.edu  Fri Apr  8 14:05:05 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Fri Apr  8 14:05:05 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <4255B7D6.9000109@ee.byu.edu>
References: <20050407211501.60155.qmail@web50203.mail.yahoo.com> <4255B7D6.9000109@ee.byu.edu>
Message-ID: <819eb85df29878341dd00521bbba280d@stsci.edu>

On Apr 7, 2005, at 6:44 PM, Travis Oliphant wrote:
>
> I can't think of anything you've missed.
>
> I'm very supportive of this, but I have to finish scipy.base first.   
> I think Perry is supportive as well.  I know he's been playing 
> catch-up in the reading.   I'm not sure of Todd's opinion.   I suspect 
> he would welcome these changes to Python.
>
> My preference order is
>
> 1) the ndarray module and ndarray.h  header with these interface 
> definitions and methods. 2) Add array interface attributes to array 
> module
> 3) Flesh out locked buffer API
> 4) Bytes object (with Pickling support)
> 5) Fix current buffer object.
>

I agree as well (I think).  Just to be sure I'll restate. These issues 
are all important, and the the discussion has been very useful to flesh 
out the proposed array protocol. Nevertheless, I'd put the priority of 
getting these into Python, or accepted by the Python Dev community 
lower than actually implementing Numeric3 (aka scipy.base) to the point 
that it acceptable to both Numeric and numarray communities. True, 
subsequent changes forced by the acceptance process may require 
reworking in scipy.base, but I put unification far ahead of getting 
these various components finished and into Python. I think that's what 
Travis is getting at too.

I've been tied up in other things, but frankly, I haven't seen that 
much that I have objected to so far in the array protocol discussions 
to warrant comments from me. I think it has been pretty well done (and 
I'm about to leave town so I'm going to be out of touch for a week or 
so, at least mostly)

Perry


From xscottg at yahoo.com  Fri Apr  8 14:43:02 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 14:43:02 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050408214214.45907.qmail@web50206.mail.yahoo.com>

--- Andrew Straw <strawman at astraw.com> wrote:
> >
> > This behavior is explained by Tim Peters:
> >
> >
http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a
> >
> I feared it was something like that. (No platform independent way to 
> represent special values like nan, inf, and so on.)  So I think if we're 
> going to require an encoding character such as '<' or '>' we should also 
> include one that means native which CAN handle these special values... 
> And document why it's needed and why it may get one into trouble.
> 

The data is either big endian or little endian (or possibly a single byte
in which case it doesn't matter).  Whether or not the (hardware, operating
system, C runtime library, C compiler, or Python implementation) can handle
NaNs or Infs is not a property of the data.

What does an additional code or two get you?  Let's say we used ']' for big
endian native, and '[' for little endian native?  Does that just indicate
the possible presence of NaNs for Infs in the data?

Adding those codes doesn't have any affect on whether or not libraries can
deal with them.  I guess I'm not understanding something.


Cheers,
    -Scott


From cookedm at physics.mcmaster.ca  Fri Apr  8 14:52:02 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Fri Apr  8 14:52:02 2005
Subject: [Numpy-discussion] Alternate C-only array protocol for speed?
In-Reply-To: <4256EF45.6070004@ee.byu.edu> (Travis Oliphant's message of
 "Fri, 08 Apr 2005 14:53:25 -0600")
References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca>
	<4256EF45.6070004@ee.byu.edu>
Message-ID: <qnkbr8pey43.fsf@arbutus.physics.mcmaster.ca>

Travis Oliphant <oliphant at ee.byu.edu> writes:

> David M. Cooke wrote:
>
>>It seems that people are worried about speed of the attribute-based
>>array interface when using small arrays in C.
>>
>>
> I think we are talking about here an *array protocol*   (i.e. like the
> buffer protocol and sequence
> protocol).
>
> So far we have just described the Python level interface.  I would
> like to see an array protocol added (perhaps to the buffer protocol
> table).  This could be done just as David describes --- we don't even
> need to use the C-pointer (just return a void *pointer which has a
> version as the first entry).

The purpose of the CObject was to make it possible to pass it through
Python (through the attribute access).

> I think this is how the C-level should be handled, I think.  Yes, it
> does not require changes to Python to implement the __array_c__
> attribute.  But, ultimately, it would be better if we used the C-level
> protocol concept that Python already uses for other objects.

Ah, ok, so you'd have a slot in the type object (like the
number, sequence, or buffer protocols), with the appropriate (C-level)
functions. This would require it to be in the Python core, though, and
would only work for a new version of Python. Alternatively, you have a
special attribute/method that returns an object with the right C API
-- much like CObjects are used for wrapping Numeric's C API.

I would really like to see something working at the C level (so you're
not passing dimensions back-and-forth as Python tuples with Python
ints), but the Python-level array interface you've proposed will work
for now.

This should be revisited once people are using the new array
interface, and we have an idea of how it's being used, and the
performance costs.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From xscottg at yahoo.com  Fri Apr  8 16:06:02 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Fri Apr  8 16:06:02 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050408230455.35465.qmail@web50209.mail.yahoo.com>

--- Scott Gilbert <xscottg at yahoo.com> wrote:
> 
> --- Andrew Straw <strawman at astraw.com> wrote:
> > 
> > I feared it was something like that. (No platform independent way to 
> > represent special values like nan, inf, and so on.)  So I think if
> > we're going to require an encoding character such as '<' or '>' we
> > should also include one that means native which CAN handle these
> > special values... And document why it's needed and why it may get one
> > into trouble.
> > 
> 
> Let's say we used ']' for big endian native, and '[' for little endian
> native?  Does that just indicate the possible presence of NaNs for Infs
> in the data?
> 
> Adding those codes doesn't have any affect on whether or not libraries
> can deal with them.  I guess I'm not understanding something.
> 

I think I'm understanding my problem in understanding  :-).  There IS a
platform independant way to represent NaNs and Infs.  It's pretty clearly
spelled out in IEEE-754:

    http://stevehollasch.com/cgindex/coding/ieeefloat.html

I think something we've been assuming is that the array data is basically
IEEE-754 compliant (maybe it needs to be byteswapped).  If that's not true,
then we're going to need some new typecodes.  We're not supporting the
ability to pass VAX floating point around (Are we????).

The problem is that you can't make any safe assumptions about whether your
current platform will deal with IEEE-754 data in any predictable way if it
contains NaNs or Infs.  So additional typecodes won't really solve
anything.


Tim Peter's explanation is a good representation of Python's official
position regarding floating point issues, but a much simpler explanation is
possible...  

The struct module in "standard mode" decodes the data one character at a
time and builds a float from them.  You can see this in the
_PyFloat_Unpack8 function in the floatobject.c file.  In other words, this
routine probably works on a VAX too (taking a IEEE-754 double and building
a VAX floating point as it goes).  You can also see the comment in there
that says it doesn't handle NaNs or Infs.


I don't think we need another indicator for '>' big-endian or '<' for
little-endian.


Cheers,
    -Scott


From konrad.hinsen at laposte.net  Fri Apr  8 23:46:00 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Fri Apr  8 23:46:00 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <20050408230455.35465.qmail@web50209.mail.yahoo.com>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com>
Message-ID: <95b362f578483f1a9ee3e850e108c6d8@laposte.net>

On 09.04.2005, at 01:04, Scott Gilbert wrote:

> I think something we've been assuming is that the array data is  
> basically
> IEEE-754 compliant (maybe it needs to be byteswapped).  If that's not  
> true,
> then we're going to need some new typecodes.  We're not supporting the
> ability to pass VAX floating point around (Are we????).

This discussion has been coming up regularly for a few years. Until now  
the concensus has always been that Python should make no assumptions  
that go beyond what a C compiler can promise. Which means no  
assumptions about floating-point representation.

Of course the computing world is changing, and IEEE format may well be  
ubiquitous by now. Vaxes must be in the museum by now. But how about  
mainframes? IBM mainframes didn't use IEEE when I used them (last time  
15 years ago), and they are still around, possibly compatible to their  
ancestors.

Another detail to consider is that although most machines use the IEEE  
representation, hardly any respects the IEEE rules for floating point  
operations in all detail. In particular, trusting that Inf and NaN will  
be treated as IEEE postulates is a risky business.

Konrad.
--
------------------------------------------------------------------------ 
-------
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
------------------------------------------------------------------------ 
-------


From xscottg at yahoo.com  Sat Apr  9 09:36:05 2005
From: xscottg at yahoo.com (Scott Gilbert)
Date: Sat Apr  9 09:36:05 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: 6667
Message-ID: <20050409163525.93733.qmail@web50201.mail.yahoo.com>

--- konrad.hinsen at laposte.net wrote:
> 
> This discussion has been coming up regularly for a few years. Until now  
> the concensus has always been that Python should make no assumptions  
> that go beyond what a C compiler can promise. Which means no  
> assumptions about floating-point representation.
> 
> Of course the computing world is changing, and IEEE format may well be  
> ubiquitous by now. Vaxes must be in the museum by now. But how about  
> mainframes? IBM mainframes didn't use IEEE when I used them (last time  
> 15 years ago), and they are still around, possibly compatible to their  
> ancestors.
> 

I've been following this mailing list for a few years now, but I skip a lot
of threads.  I almost certainly skipped this topic in the past since it
wasn't relevant to me.  I'm only interested in it now since it's relevant
to this data interchange business, so I'm sorry if this is a rehash...

Trying to stay portable is a good goal, and I can understand why Python
proper would try to adhere to the restrictions it does.  Despite the claim,
Python makes plenty of assumptions that a standards conformant C compiler
could break.  If numpy doesn't make some assumptions about floating point
representation, it's going to kill the possibity of passing data across
machines, and that's pretty unacceptable.

I'm not comfortable saying "ubiquitous" since I don't know what the
mainframe or super computing community is making use of, and I don't know
what sort of little machines Python is running on.  The closest thing to a
mainframe that I've ever used was a Convex, and I never knew what it's
floating point representation was.  However, I know that x86, PPC, AMD-64,
IA64, Alpha, Sparc, and whatever HPUX and SGIs are running on all use
IEEE-754 format.  That's probably 99.999% of all machines capable of
running Python, and at least that percentage of users.

It would be a shame to gum up this typecode thing for situations that don't
occur in practice.  If it has to be done, then I recommend we use the '@'
code in place of the '<' or '>' for platforms that are out of the ordinary.
 It's important to specify that '@' is only to be used on floating point
data that is not IEEE-754.  In this case it doesn't mean "native" like it
does in the struct module, it means "weird" :-).


>
> Another detail to consider is that although most machines use the IEEE  
> representation, hardly any respects the IEEE rules for floating point  
> operations in all detail. In particular, trusting that Inf and NaN will  
> be treated as IEEE postulates is a risky business.
> 

See that's the thing.  Why burden how you label the data with the
restrictions of the current machine?  You can take the data off the
machine.  Whether or not I can rely on what NaN*Inf will give me, I know
that I can take NaN and Inf to another machine and get the same
interpretation of the data.

This whole thread started because Andrew Straw showed that
struct.pack('<d',nan) causes an exception, but that's just a limitation in
the struct module.  He was definitely running it on a machine that was
capable of representing an 8 byte little-endian NaN.  He doesn't need a new
typecode until he tries to transport data from some esoteric mainframe.


Cheers,
    -Scott


From oliphant at ee.byu.edu  Sat Apr  9 09:55:04 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sat Apr  9 09:55:04 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <95b362f578483f1a9ee3e850e108c6d8@laposte.net>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net>
Message-ID: <425808B4.8070005@ee.byu.edu>

konrad.hinsen at laposte.net wrote:

> On 09.04.2005, at 01:04, Scott Gilbert wrote:
>
>> I think something we've been assuming is that the array data is  
>> basically
>> IEEE-754 compliant (maybe it needs to be byteswapped).  If that's 
>> not  true,
>> then we're going to need some new typecodes.  We're not supporting the
>> ability to pass VAX floating point around (Are we????).
>

No, in moving from the struct modules character codes we are trying to 
do something more platform independent because it is very likely that 
different platforms will want to exchange binary data.   IEEE-754 is a 
great standard to build an interface around.   Data sharing was the 
whole reason the standard emerged and a lot of companies got on board.

>
> This discussion has been coming up regularly for a few years. Until 
> now  the concensus has always been that Python should make no 
> assumptions  that go beyond what a C compiler can promise. Which means 
> no  assumptions about floating-point representation.
>
> Of course the computing world is changing, and IEEE format may well 
> be  ubiquitous by now. Vaxes must be in the museum by now. But how 
> about  mainframes? IBM mainframes didn't use IEEE when I used them 
> (last time  15 years ago), and they are still around, possibly 
> compatible to their  ancestors.

I found the following piece, written about 6 years ago interesting:

http://www.research.ibm.com/journal/rd/435/schwarz.html

Basically, it states that chips in newer IBM mainframes support the IEEE 
754 standard.

>
> Another detail to consider is that although most machines use the 
> IEEE  representation, hardly any respects the IEEE rules for floating 
> point  operations in all detail. In particular, trusting that Inf and 
> NaN will  be treated as IEEE postulates is a risky business.

But, this can be handled with platform-dependendent C-code when and if 
problems arise.  

-Travis


From strawman at astraw.com  Sat Apr  9 12:36:03 2005
From: strawman at astraw.com (Andrew Straw)
Date: Sat Apr  9 12:36:03 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <425808B4.8070005@ee.byu.edu>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu>
Message-ID: <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com>

Here's an email Todd Miller sent me (I hoped he'd send it directly to 
the list, but I'll forward it.  Todd, I hope you don't mind.)

Todd Miller wrote:

> On Fri, 2005-04-08 at 15:46 -0700, Andrew Straw wrote:
>> Hi Todd,
>>
>> Could you join in on this thread?  I think you wrote the ieeespecial
>> stuff in numarray, so it's clear you have a much better understanding 
>> of
>> the issues than I do...
>>
>> Cheers!
>> Andrew
>
> My own understanding is limited,  but I can say a few things that might
> make the status of numarray clearer.  My assumptions for numarray were
> that:
>
> 1. Floating point values are 32-bit or 64-bit entities which are stored
> in IEEE-754 format.  This is a basic assumption of numarray.ieeespecial
> so I expect it simply won't work on a VAX.  There's no checking for
> this.
>
> 2. The platforms that I care about,  AMD/Intel Windows/Linux, PowerPC
> OS-X, and Ultra-SPARC Solaris,  all seem to provide IEEE-754 floating
> point.  ieeespecial has been tested to work there.
>
> 3. I viewed IEEE-754 floating point numbers as 32-bit or 64-bit 
> unsigned
> integers,  and contiguous ranges on those integers are used to 
> represent
> special values like NAN and INF.  Platform byte ordering for the
> IEEE-754 floating point numbers mirrors byte ordering for integers so
> the ieeespecial NAN detection code works in a cross platform way *and*
> values exported from one IEEE-754 platform will work with ieeespecial
> when imported on another.  It's important to note that special values
> are not unique:  there is no single NAN value;  it's a bit range.
>
> 4. numarray leaks IEEE-754 special values out into Python floating 
> point
> scalars.  This may be bad form.  I do this because (1) they repr
> understandably if not in a platform independent way and (2) people need
> to get at them.  I noticed recently that ieeespecial.nan ==
> ieeespecial.nan returns incorrect answers (True!) for Python-2.3 and
> correct ones (False) for Python-2.4.  I haven't looked at what the 
> array
> version does yet:  array(nan) == array(nan).  The point to be taken 
> from
> this is that the level at which numarray ieee special value handling
> works or doesn't work is really restricted to (1) detecting certain
> ieee-754 bit ranges (2) the basic behavior of C code for C89 complilers
> for array code (no guarantees) (3) the behavior of Python itself
> (improving).
>
> In the context of the array protocol (looking very nice by the way) my
> thinking is that non-IEEE-754 floating point could be described with 
> bit
> fields and that the current type codes should mean IEEE-754.
>
> Some minor things I noticed in the array interface:
>
> 1. The packing order of bit fields is not clear.  In C,  my experience
> is that some compilers pack bit structs towards the higher order bits 
> of
> an integer,  and some towards the lower.  More info to clarify that
> would be helpful.
>
> 2.  I saw no mention that we're talking about a protocol.  I'm sure
> that's clear to everyone following this discussion closely,  but I
> didn't see it in the spec.  It might make sense to allude to the C
> helper functions and potential for additions to the Python type struct
> even if they're not spelled out.
>
> Regards,
> Todd


On Apr 9, 2005, at 9:54 AM, Travis Oliphant wrote:

> konrad.hinsen at laposte.net wrote:
>
>> On 09.04.2005, at 01:04, Scott Gilbert wrote:
>>
>>> I think something we've been assuming is that the array data is  
>>> basically
>>> IEEE-754 compliant (maybe it needs to be byteswapped).  If that's 
>>> not  true,
>>> then we're going to need some new typecodes.  We're not supporting 
>>> the
>>> ability to pass VAX floating point around (Are we????).
>>
>
> No, in moving from the struct modules character codes we are trying to 
> do something more platform independent because it is very likely that 
> different platforms will want to exchange binary data.   IEEE-754 is a 
> great standard to build an interface around.   Data sharing was the 
> whole reason the standard emerged and a lot of companies got on board.
>
>>
>> This discussion has been coming up regularly for a few years. Until 
>> now  the concensus has always been that Python should make no 
>> assumptions  that go beyond what a C compiler can promise. Which 
>> means no  assumptions about floating-point representation.
>>
>> Of course the computing world is changing, and IEEE format may well 
>> be  ubiquitous by now. Vaxes must be in the museum by now. But how 
>> about  mainframes? IBM mainframes didn't use IEEE when I used them 
>> (last time  15 years ago), and they are still around, possibly 
>> compatible to their  ancestors.
>
> I found the following piece, written about 6 years ago interesting:
>
> http://www.research.ibm.com/journal/rd/435/schwarz.html
>
> Basically, it states that chips in newer IBM mainframes support the 
> IEEE 754 standard.
>
>>
>> Another detail to consider is that although most machines use the 
>> IEEE  representation, hardly any respects the IEEE rules for floating 
>> point  operations in all detail. In particular, trusting that Inf and 
>> NaN will  be treated as IEEE postulates is a risky business.
>
> But, this can be handled with platform-dependendent C-code when and if 
> problems arise.
> -Travis
>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real 
> users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion


From jmiller at stsci.edu  Sat Apr  9 16:18:00 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Sat Apr  9 16:18:00 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com>
	 <95b362f578483f1a9ee3e850e108c6d8@laposte.net>
	 <425808B4.8070005@ee.byu.edu> <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com>
Message-ID: <1113088643.5363.8.camel@jaytmiller.comcast.net>

On Sat, 2005-04-09 at 12:35 -0700, Andrew Straw wrote:
> Here's an email Todd Miller sent me (I hoped he'd send it directly to 
> the list, but I'll forward it.  Todd, I hope you don't mind.)

No, I don't mind.  I intended to send it to the list but left in a rush
this morning.  

Todd

> 
> > On Fri, 2005-04-08 at 15:46 -0700, Andrew Straw wrote:
> >> Hi Todd,
> >>
> >> Could you join in on this thread?  I think you wrote the ieeespecial
> >> stuff in numarray, so it's clear you have a much better understanding 
> >> of
> >> the issues than I do...
> >>
> >> Cheers!
> >> Andrew
> >
> > My own understanding is limited,  but I can say a few things that might
> > make the status of numarray clearer.  My assumptions for numarray were
> > that:
> >
> > 1. Floating point values are 32-bit or 64-bit entities which are stored
> > in IEEE-754 format.  This is a basic assumption of numarray.ieeespecial
> > so I expect it simply won't work on a VAX.  There's no checking for
> > this.
> >
> > 2. The platforms that I care about,  AMD/Intel Windows/Linux, PowerPC
> > OS-X, and Ultra-SPARC Solaris,  all seem to provide IEEE-754 floating
> > point.  ieeespecial has been tested to work there.
> >
> > 3. I viewed IEEE-754 floating point numbers as 32-bit or 64-bit 
> > unsigned
> > integers,  and contiguous ranges on those integers are used to 
> > represent
> > special values like NAN and INF.  Platform byte ordering for the
> > IEEE-754 floating point numbers mirrors byte ordering for integers so
> > the ieeespecial NAN detection code works in a cross platform way *and*
> > values exported from one IEEE-754 platform will work with ieeespecial
> > when imported on another.  It's important to note that special values
> > are not unique:  there is no single NAN value;  it's a bit range.
> >
> > 4. numarray leaks IEEE-754 special values out into Python floating 
> > point
> > scalars.  This may be bad form.  I do this because (1) they repr
> > understandably if not in a platform independent way and (2) people need
> > to get at them.  I noticed recently that ieeespecial.nan ==
> > ieeespecial.nan returns incorrect answers (True!) for Python-2.3 and
> > correct ones (False) for Python-2.4.  I haven't looked at what the 
> > array
> > version does yet:  array(nan) == array(nan).  The point to be taken 
> > from
> > this is that the level at which numarray ieee special value handling
> > works or doesn't work is really restricted to (1) detecting certain
> > ieee-754 bit ranges (2) the basic behavior of C code for C89 complilers
> > for array code (no guarantees) (3) the behavior of Python itself
> > (improving).
> >
> > In the context of the array protocol (looking very nice by the way) my
> > thinking is that non-IEEE-754 floating point could be described with 
> > bit
> > fields and that the current type codes should mean IEEE-754.
> >
> > Some minor things I noticed in the array interface:
> >
> > 1. The packing order of bit fields is not clear.  In C,  my experience
> > is that some compilers pack bit structs towards the higher order bits 
> > of
> > an integer,  and some towards the lower.  More info to clarify that
> > would be helpful.
> >
> > 2.  I saw no mention that we're talking about a protocol.  I'm sure
> > that's clear to everyone following this discussion closely,  but I
> > didn't see it in the spec.  It might make sense to allude to the C
> > helper functions and potential for additions to the Python type struct
> > even if they're not spelled out.
> >
> > Regards,
> > Todd
> 
> 
> On Apr 9, 2005, at 9:54 AM, Travis Oliphant wrote:
> 
> > konrad.hinsen at laposte.net wrote:
> >
> >> On 09.04.2005, at 01:04, Scott Gilbert wrote:
> >>
> >>> I think something we've been assuming is that the array data is  
> >>> basically
> >>> IEEE-754 compliant (maybe it needs to be byteswapped).  If that's 
> >>> not  true,
> >>> then we're going to need some new typecodes.  We're not supporting 
> >>> the
> >>> ability to pass VAX floating point around (Are we????).
> >>
> >
> > No, in moving from the struct modules character codes we are trying to 
> > do something more platform independent because it is very likely that 
> > different platforms will want to exchange binary data.   IEEE-754 is a 
> > great standard to build an interface around.   Data sharing was the 
> > whole reason the standard emerged and a lot of companies got on board.
> >
> >>
> >> This discussion has been coming up regularly for a few years. Until 
> >> now  the concensus has always been that Python should make no 
> >> assumptions  that go beyond what a C compiler can promise. Which 
> >> means no  assumptions about floating-point representation.
> >>
> >> Of course the computing world is changing, and IEEE format may well 
> >> be  ubiquitous by now. Vaxes must be in the museum by now. But how 
> >> about  mainframes? IBM mainframes didn't use IEEE when I used them 
> >> (last time  15 years ago), and they are still around, possibly 
> >> compatible to their  ancestors.
> >
> > I found the following piece, written about 6 years ago interesting:
> >
> > http://www.research.ibm.com/journal/rd/435/schwarz.html
> >
> > Basically, it states that chips in newer IBM mainframes support the 
> > IEEE 754 standard.
> >
> >>
> >> Another detail to consider is that although most machines use the 
> >> IEEE  representation, hardly any respects the IEEE rules for floating 
> >> point  operations in all detail. In particular, trusting that Inf and 
> >> NaN will  be treated as IEEE postulates is a risky business.
> >
> > But, this can be handled with platform-dependendent C-code when and if 
> > problems arise.
> > -Travis
> >
> >
> >
> >
> > -------------------------------------------------------
> > SF email is sponsored by - The IT Product Guide
> > Read honest & candid reviews on hundreds of IT Products from real 
> > users.
> > Discover which products truly live up to the hype. Start reading now.
> > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion


From tchur at optushome.com.au  Sat Apr  9 17:25:43 2005
From: tchur at optushome.com.au (Tim Churches)
Date: Sat Apr  9 17:25:43 2005
Subject: [Numpy-discussion] Silent overflow of Int32 array
Message-ID: <4258721E.1080905@optushome.com.au>

I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit 
Linux):

 >>> import Numeric as N
 >>> a = N.array((2000000000,1000000000),typecode=N.Int32)
 >>> N.add.reduce(a)
-1294967296

OK, it is an elementary mistake, but the silent overflow caught me 
unawares. casting the array to Float64 before summing it avoids the  
error, but in my instance the actual data is a rank-1 array of 21 
million integers with a mean value of about 140 (which adds up more than 
sys.maxint), and casting to Float64 will use quite a lot of memory (as 
well as taking some time).

Any advice for catching or avoiding such overflow without always 
incurring a performance and memory hit by always casting to Float64? 
Shouldn't add.reduce() be checking for overflow and raising an error? 
Then it would be easy to upcast only when overflow (or underflow) 
occurs, rather than always.

Tim C


From jmiller at stsci.edu  Sun Apr 10 07:25:08 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Sun Apr 10 07:25:08 2005
Subject: [Numpy-discussion] Silent overflow of Int32 array
In-Reply-To: <4258721E.1080905@optushome.com.au>
References: <4258721E.1080905@optushome.com.au>
Message-ID: <1113143026.5359.35.camel@jaytmiller.comcast.net>

On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote:
> I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit 
> Linux):
> 
>  >>> import Numeric as N
>  >>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>  >>> N.add.reduce(a)
> -1294967296
> 
> OK, it is an elementary mistake, but the silent overflow caught me 
> unawares. casting the array to Float64 before summing it avoids the  
> error, but in my instance the actual data is a rank-1 array of 21 
> million integers with a mean value of about 140 (which adds up more than 
> sys.maxint), and casting to Float64 will use quite a lot of memory (as 
> well as taking some time).
> 
> Any advice for catching or avoiding such overflow without always 
> incurring a performance and memory hit by always casting to Float64? 

Here's what numarray does:

>>> import numarray as N
>>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>>> N.add.reduce(a)
-1294967296

So basic reductions in numarray have the same "careful while you're
shaving" behavior as Numeric;  it's fast but easy to screw up.

But:

>>> a.sum()
3000000000L
>>> a.sum(type='d')
3000000000.0

a.sum() blockwise upcasts to the largest type of kind on the fly, in
this case, Int64.   This avoids the storage overhead of typecasting the
entire array. 

A better name for the method would have been sumall() since it sums all
elements of a multi-dimensional array.  The flattening process reduces
on one dimension before flattening preventing a full copy of a
discontiguous array.  It could be smarter about choosing the dimension
of the initial reduction.

Regards,
Todd


From pearu at cens.ioc.ee  Mon Apr 11 00:59:14 2005
From: pearu at cens.ioc.ee (Pearu Peterson)
Date: Mon Apr 11 00:59:14 2005
Subject: [Numpy-discussion] scipy.base
Message-ID: <Pine.LNX.4.21.0504110939080.26915-100000@cens.kybi>

Hi Travis,

I have committed scipy.{distutils,base} to Numeric3 CVS repository. 
scipy.distutils is a reviewed version of scipy_distutils and
as one of its new features there is Configuration class that allows
one to write much simpler setup.py files for subpackages. See setup.py
files under Numeric3/scipy directory for examples. scipy.base is a
very minimal copy of scipy_base plus ndarray modules.

When using setup_scipy.py for building, the ndarray package is installed
as scipy.base and

  from scipy.base import *

should work equivalently to

  from ndarray import *

for instance.

I have used information from Numeric3/setup.py to implement
Numeric3/scipy/base/setup.py and it should be updated whenever
Numeric3/setup.py is changed. 
However, I would recommend start using scipy.base instead of ndarray as
using both may cause unexpected behaviour when installed ndarray is older
than scipy.base installation (see [*]). In Numeric3 CVS repository that
would mean replacing setup.py with setup_scipy.py and any modification to
ndarray setup scripts should be done in scipy/base/setup.py. We can apply
this step whenever you feel confident with new setup.py files. Let me know
if you have any troubles with them.

To clean up Numeric3 CVS repository completely then Include, Src, Lib,
CodeGenerators directories should be moved under the scipy/base directory.
However, this step can be omitted if you would prefer working with files
at the top directory of Numeric3. Current setup.py scripts fully
support this approach as well.

There are also few open issues and questions. 

First, how to name Numeric3 project when it installs scipy.base,
scipy.distutils, Numeric packages, etc? This name will be used when
creating source distributions and also as part of the path where header
files will be installed. At the moment setup_scipy.py uses the name
'ndarray'. And so `setup_scipy.py sdist`, for example, produces
ndarray-30.0.tar.gz file; `setup_scipy.py install` installs header files
under <prefix>/include/ndarray/ directory. Though this is fine with me, I
am not sure that this is an ideal situation. I think we should choose the
name now and stick to it forever, especially since 3rd party extension
modules need to know where to look for ndarray header files. This name
cannot be 'numarray', obviously, but there are options like 'ndarray',
'numpy', and may be others. 
In fact, 'Numeric' (with version 3x.x) would be also an option but that
would be certainly cause some problems when one wants both Numeric 2x.x
and Numeric 3x.x to be installed in the system, the header files would end
up in the same directory, for instance. As a workaround, we could force
installing Numeric3 header files to <prefix>/include/Numeric/3/ or
something. I acctually like this idea but I wonder what other think about
this.

Second, is it already possible to use ndarray C/API as a replacement of
Numeric C/API, i.e. would simple replacement of 

  #include "Numeric/arrayobject.h"

with 

  #include "ndarray/arrayobject.h"

work? And if not, will it ever be? This would be interesting to know as an
extension writer.

[*] Due to keeping changes to Numeric3 sources minimal, scipy.base
multiarray and umath modules first try to import ndarray and then
scipy.base whenever ndarray is missing. One should remove ndarray
installation from the system before using scipy.base.

Regards,
Pearu


From konrad.hinsen at laposte.net  Mon Apr 11 02:30:28 2005
From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net)
Date: Mon Apr 11 02:30:28 2005
Subject: [Numpy-discussion] Questions about the array interface.
In-Reply-To: <425808B4.8070005@ee.byu.edu>
References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu>
Message-ID: <f48ec8afce5fef1bc1268696e9ea21c8@laposte.net>

On Apr 9, 2005, at 18:54, Travis Oliphant wrote:

> No, in moving from the struct modules character codes we are trying to 
> do something more platform independent because it is very likely that 
> different platforms will want to exchange binary data.   IEEE-754 is a 
> great standard to build

For data exchange between platforms, i.e. through files and network 
connections, XDR is arguably a better choice. It actually uses IEEE for 
floats, but XDR libraries provide conversion code for other platforms. 
It also takes care of byte ordering.

>  an interface around.   Data sharing was the whole reason the standard 
> emerged and a lot of companies got on board.

I think the main reason was standardization of precision, range, and 
operations, to make floating-point code more portable. This has had 
moderate success, as 100% IEEE platforms are rare if they exist at all.

>> Another detail to consider is that although most machines use the 
>> IEEE  representation, hardly any respects the IEEE rules for floating 
>> point  operations in all detail. In particular, trusting that Inf and 
>> NaN will  be treated as IEEE postulates is a risky business.
>
> But, this can be handled with platform-dependendent C-code when and if 
> problems arise.

Can it? I have faint memories about Tim Peters explaining why and how 
handling IEEE in C code is a pain. Anyway, it would be a good idea to 
get his opinion on whatever proposal about IEEE before implementing it.

Konrad.


From tchur at optushome.com.au  Mon Apr 11 13:52:19 2005
From: tchur at optushome.com.au (Tim Churches)
Date: Mon Apr 11 13:52:19 2005
Subject: [Numpy-discussion] Silent overflow of Int32 array
In-Reply-To: <1113143026.5359.35.camel@jaytmiller.comcast.net>
References: <4258721E.1080905@optushome.com.au> <1113143026.5359.35.camel@jaytmiller.comcast.net>
Message-ID: <425AE33C.30403@optushome.com.au>

Todd Miller wrote:
> On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote:
> 
>>I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit 
>>Linux):
>>
>> >>> import Numeric as N
>> >>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>> >>> N.add.reduce(a)
>>-1294967296
>>
>>OK, it is an elementary mistake, but the silent overflow caught me 
>>unawares. casting the array to Float64 before summing it avoids the  
>>error, but in my instance the actual data is a rank-1 array of 21 
>>million integers with a mean value of about 140 (which adds up more than 
>>sys.maxint), and casting to Float64 will use quite a lot of memory (as 
>>well as taking some time).
>>
>>Any advice for catching or avoiding such overflow without always 
>>incurring a performance and memory hit by always casting to Float64? 
> 
> 
> Here's what numarray does:
> 
> 
>>>>import numarray as N
>>>>a = N.array((2000000000,1000000000),typecode=N.Int32)
>>>>N.add.reduce(a)
> 
> -1294967296
> 
> So basic reductions in numarray have the same "careful while you're
> shaving" behavior as Numeric;  it's fast but easy to screw up.

Sure, but how does one be careful? It seems that for any array of two
integers or more which could sum to more than sys.maxint or less than
-sys.maxint, add.reduce() in both NumPy and Numeric will give either a)
the correct answer or b) the incorrect answer, and short of adding up
the array using a safer but much slower method, there is no way of
determining if the answer provided (quickly) by add.reduce is right or
wrong? Which seems to make it fast but useless (for integer arrays, at
least? Is that an unfair summary? Can anyone point me towards a method
for using add.reduce() on small arrays of large integers with values in
the billions, or on large arrays of fairly small integer values, which
will not suddenly and without warning give the wrong answer?

> 
> But:
> 
> 
>>>>a.sum()
> 
> 3000000000L
> 
>>>>a.sum(type='d')
> 
> 3000000000.0
> 
> a.sum() blockwise upcasts to the largest type of kind on the fly, in
> this case, Int64.   This avoids the storage overhead of typecasting the
> entire array. 

That's on a 64-bit platform, right? The same method could be used to
cast the accumulator to a Float64 on a 32-bit platform to avoid casting
the entire array?

> A better name for the method would have been sumall() since it sums all
> elements of a multi-dimensional array.  The flattening process reduces
> on one dimension before flattening preventing a full copy of a
> discontiguous array.  It could be smarter about choosing the dimension
> of the initial reduction.

OK, thanks. Unfortunately it is not possible for us to port our
application to numarray at the moment. But the insight is most helpful.

Tim C


From oliphant at ee.byu.edu  Mon Apr 11 17:12:25 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr 11 17:12:25 2005
Subject: [Numpy-discussion] scipy.base
In-Reply-To: <Pine.LNX.4.21.0504110939080.26915-100000@cens.kybi>
References: <Pine.LNX.4.21.0504110939080.26915-100000@cens.kybi>
Message-ID: <425B1182.7060102@ee.byu.edu>

Pearu Peterson wrote:

>Hi Travis,
>
>I have committed scipy.{distutils,base} to Numeric3 CVS repository. 
>scipy.distutils is a reviewed version of scipy_distutils and
>as one of its new features there is Configuration class that allows
>one to write much simpler setup.py files for subpackages. See setup.py
>files under Numeric3/scipy directory for examples. scipy.base is a
>very minimal copy of scipy_base plus ndarray modules.
>  
>
Thank you, thank you for your help with this.

>When using setup_scipy.py for building, the ndarray package is installed
>as scipy.base and
>
>  from scipy.base import *
>
>should work equivalently to
>
>  from ndarray import *
>
>for instance.
>  
>
I don't like from ndarray import *.   It's only been a place-holder.  
Let's get rid of it as soon as possible.

>To clean up Numeric3 CVS repository completely then Include, Src, Lib,
>CodeGenerators directories should be moved under the scipy/base directory.
>However, this step can be omitted if you would prefer working with files
>at the top directory of Numeric3. 
>
I have no preference here.   Whatever works best.

>First, how to name Numeric3 project when it installs scipy.base,
>scipy.distutils, Numeric packages, etc? This name will be used when
>creating source distributions and also as part of the path where header
>files will be installed. At the moment setup_scipy.py uses the name
>'ndarray'. 
>
I don't like the name ndarray -- it's too limiting.  Why not scipy_core? 

>In fact, 'Numeric' (with version 3x.x) would be also an option but that
>would be certainly cause some problems when one wants both Numeric 2x.x
>and Numeric 3x.x to be installed in the system, the header files would end
>up in the same directory, for instance. As a workaround, we could force
>installing Numeric3 header files to <prefix>/include/Numeric/3/ or
>something. I acctually like this idea but I wonder what other think about
>this.
>  
>
How about include/scipy?

>Second, is it already possible to use ndarray C/API as a replacement of
>Numeric C/API, i.e. would simple replacement of 
>
>  #include "Numeric/arrayobject.h"
>
>with 
>
>  #include "ndarray/arrayobject.h"
>
>work? And if not, will it ever be? This would be interesting to know as an
>extension writer.
>  
>
This should work fine.   All of the old C-API is there (there are some 
new calls, but the old ones should still work).   The only issue is that 
one of the calls (PyArray_Take I think now uses a standardized 
PyArrayObject * as one of it's arguments instead of a PyObject *).  This 
shouldn't be a problem, since you always had to call it with an array.  
It's just now more explicit, but could lead to a warning.

>[*] Due to keeping changes to Numeric3 sources minimal, scipy.base
>multiarray and umath modules first try to import ndarray and then
>scipy.base whenever ndarray is missing. One should remove ndarray
>installation from the system before using scipy.base.
>  
>
I don't mind changing the package names entirely at this point. 

-Travis


From oliphant at ee.byu.edu  Tue Apr 12 16:39:23 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Tue Apr 12 16:39:23 2005
Subject: [Numpy-discussion] Subclassing and metadata
Message-ID: <425C5BDF.1010802@ee.byu.edu>

I think I've found a possible solution for subclasses that want to 
handle metadata.

Essentially, any subclass that defines the method _update_meta(self, other)
will get that method called when an array is sliced, or subscripted.

Anytime an array is created where a subtype is the caller, this method 
will be called if it is available.


Here is a simple example:

import ndarray

class subclass(ndarray.ndarray):
    def __new__(self, shape, *args, **kwds):
        self = ndarray.ndarray.__new__(subclass, shape, 'V4')
        return self

    def __init__(self, shape, *args, **kwds):
        self.dict = kwds
        return
   
    def _update_meta(self, obj):
        self.dict = obj.dict


Comments?

-Travis


From pearu at cens.ioc.ee  Wed Apr 13 04:06:00 2005
From: pearu at cens.ioc.ee (pearu at cens.ioc.ee)
Date: Wed Apr 13 04:06:00 2005
Subject: [Numpy-discussion] scipy.base
In-Reply-To: <425B1182.7060102@ee.byu.edu>
Message-ID: <Pine.LNX.4.21.0504131354220.3472-100000@cens.kybi>


On Mon, 11 Apr 2005, Travis Oliphant wrote:

> >When using setup_scipy.py for building, the ndarray package is installed
> >as scipy.base and
> >
> >  from scipy.base import *
> >
> >should work equivalently to
> >
> >  from ndarray import *
> >
> >for instance.
> >  
> >
> I don't like from ndarray import *.   It's only been a place-holder.  
> Let's get rid of it as soon as possible.

Done in CVS.

> >To clean up Numeric3 CVS repository completely then Include, Src, Lib,
> >CodeGenerators directories should be moved under the scipy/base directory.
> >However, this step can be omitted if you would prefer working with files
> >at the top directory of Numeric3. 
> >
> I have no preference here.   Whatever works best.

Directory Include/ndarray/ is now moved to scipy/base/Include/scipy/base/.
I'l move other directories as well.

> >First, how to name Numeric3 project when it installs scipy.base,
> >scipy.distutils, Numeric packages, etc? This name will be used when
> >creating source distributions and also as part of the path where header
> >files will be installed. At the moment setup_scipy.py uses the name
> >'ndarray'. 
> >
> I don't like the name ndarray -- it's too limiting.  Why not scipy_core? 
> 
> >In fact, 'Numeric' (with version 3x.x) would be also an option but that
> >would be certainly cause some problems when one wants both Numeric 2x.x
> >and Numeric 3x.x to be installed in the system, the header files would end
> >up in the same directory, for instance. As a workaround, we could force
> >installing Numeric3 header files to <prefix>/include/Numeric/3/ or
> >something. I acctually like this idea but I wonder what other think about
> >this.
> >  
> >
> How about include/scipy?

Without going into details of distutils restrictions for various options,
I found that

  #include "scipy/base/arrayobject.h"

option works best. And the name of the Numeric3 package is now scipy_core.
All this is implemented in Numeric3 CVS now.

> >Second, is it already possible to use ndarray C/API as a replacement of
> >Numeric C/API, i.e. would simple replacement of 
> >
> >  #include "Numeric/arrayobject.h"
> >
> >with 
> >
> >  #include "ndarray/arrayobject.h"
> >
> >work? And if not, will it ever be? This would be interesting to know as an
> >extension writer.
> >  
> >
> This should work fine.

Great!

Thanks,
Pearu


From alexandre.guimond at mirada-solutions.com  Wed Apr 13 18:10:47 2005
From: alexandre.guimond at mirada-solutions.com (Alexandre Guimond)
Date: Wed Apr 13 18:10:47 2005
Subject: [Numpy-discussion] numarray, nd_image transforms, and multi-channel images
Message-ID: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com>

Hi all.
 
I've been looking at numarray to do some image processing. A lot of the
work I do deal with transforming images, either with affine
transformations, or vector field. Numarray seems somewhat well equiped
to address these issues, but I am concerned about one aspect. It seems
that the transformation code (affine_transforrm and geometric_transform)
computes input coordonates for every output coordinate in the resulting
array. If I have an RGB image for which the transformation is the same
for all 3 RGB channels, I would assume that this will triple the
workload unncessarily. It might have a dramatic effect for the geometric
transformation which will most often be slower then affine. Is there any
way around this, e.g. is it possible to specify numarray to use the same
interpolation coefficients for the last "n" dimention of the array, or
to tell numarray to only compute interpolation coefficients and apply
those seperatly for each channel?
 
thx for any help / info.
 
alex.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20050413/03d8ebb0/attachment-0001.html>

From verveer at embl-heidelberg.de  Thu Apr 14 02:45:45 2005
From: verveer at embl-heidelberg.de (Peter Verveer)
Date: Thu Apr 14 02:45:45 2005
Subject: [Numpy-discussion] numarray, nd_image transforms, and multi-channel images
In-Reply-To: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com>
References: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com>
Message-ID: <14ba52860a6e1f838975c3c04a0dafc9@embl-heidelberg.de>

Hi Alex,

It is correct that there is an amount of work duplicated, if you do an 
identical interpolation operation on several arrays. There is currently 
no way to avoid this. This can be fixed and I will have a look to see 
how easy that is to do. If it is not easy to factor out that part of 
the code, I will most likely not be able to spend the time to do it 
though...

You could at least use the map_coordinates function that will allow you 
to use a pre-calculated coordinate mapping. There will still be 
duplication of work, but al least you avoid the duplication of the 
calculation of the coordinate transformation.

Peter

> Hi all.
> ?
> I've been looking at numarray to do some image processing. A lot of 
> the work I do deal with transforming images, either with affine 
> transformations, or vector field. Numarray seems somewhat well equiped 
> to address these issues, but I am concerned about one aspect. It seems 
> that the transformation code (affine_transforrm and 
> geometric_transform) computes input coordonates for every output 
> coordinate in the resulting array. If I have an RGB image for which 
> the transformation is the same for all 3 RGB channels, I would assume 
> that this will triple the workload unncessarily. It might have a 
> dramatic effect for the geometric transformation which will most often 
> be slower then affine. Is there any way around this, e.g. is it 
> possible to specify numarray to use the same interpolation 
> coefficients for the last "n" dimention of the array, or to tell 
> numarray to only compute interpolation coefficients and apply those 
> seperatly for each channel?
> ?
> thx for any help / info.
> ?
> alex.


From jmiller at stsci.edu  Thu Apr 14 07:47:02 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr 14 07:47:02 2005
Subject: [Numpy-discussion] ANN: numarray-1.3.0
Message-ID: <1113489855.29880.14.camel@halloween.stsci.edu>

Release Notes for numarray-1.3.0

Numarray is an array processing package designed to efficiently
manipulate large multi-dimensional arrays.  Numarray is modelled after
Numeric and features c-code generated from python template scripts, the
capacity to operate directly on arrays in files, arrays of heterogeneous
records, string arrays, and in-place operation on memory mapped files.

I. ENHANCEMENTS

1. Migration of NumArray.__del__ to C (tp_dealloc).  Overall
performance.

2. Removal of dictionary update from array view creation improves
performance of view/slice/subarray creation.  This should e.g. improve
the performance of wxPython sequence protocol access to Nx2 arrays.
Subclasses now need to do a.flags |= numarray.generic._UPDATEDICT to
ensure that dictionary based attributes are inherited by views. 
NumArrays no longer do this by default.

2. Modifications to support scipy.special.

3. Removal of an unnecessary getattr() from ufunc calling sequence.
Ufunc performance.

II. BUGS FIXED / CLOSED

1179355 average() broken in numarray 1.2.3
1167184 Floating point exception in numarray's dot()
1151892 Bug in matrixmultiply with zero size arrays
1160184 RecArray reversal
1156172 Incorect error message for shape incompatability
1155538 Incorrect error message when multiplying arrays

See
http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse
for more details.

III. CAUTIONS

This release should be backward binary compatible with numarray 1.1.1
and 1.2.3.

WHERE
-----------

Numarray-1.3.0 windows executable installers, source code, and manual is
here:

http://sourceforge.net/project/showfiles.php?group_id=1369

Numarray is hosted by Source Forge in the same project which hosts
Numeric:

http://sourceforge.net/projects/numpy/

The web page for Numarray information is at:

http://stsdas.stsci.edu/numarray/index.html

Trackers for Numarray Bugs, Feature Requests, Support, and Patches are
at the Source Forge project for NumPy at:

http://sourceforge.net/tracker/?group_id=1369

REQUIREMENTS
------------------------------

numarray-1.3.0 requires Python 2.2.2 or greater.  Python-2.3.4 or
Python-2.4.1 is recommended.


AUTHORS, LICENSE
------------------------------

Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC
Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science
Institute.  We'd like to acknowledge the assitance of Francesc Alted,
Paul Dubois, Sebastian Haase, Chuck Harris, Tim Hochberg, Nadav
Horesh, Edward C. Jones, Eric Jones, Jochen Kuepper, Travis Oliphant,
Pearu Peterson, Peter Verveer, Colin Williams, Rory Yorke, and
everyone else who has contributed with comments and feedback.

Numarray is made available under a BSD-style License.  See
LICENSE.txt in the source distribution for details.

--
Todd Miller             jmiller at stsci.edu


From jdhunter at ace.bsd.uchicago.edu  Thu Apr 14 14:14:13 2005
From: jdhunter at ace.bsd.uchicago.edu (John Hunter)
Date: Thu Apr 14 14:14:13 2005
Subject: [Numpy-discussion] ANN: matplotlib-0.80
Message-ID: <m33bttawt8.fsf@peds-pc311.bsd.uchicago.edu>

A lot of development has gone into matplotlib since the last major
release, which I'll summarize here.  For details, see the notes for
the incremental releases at http://matplotlib.sf.net/whats_new.html. 

Improvements since 0.70

 -- contouring: 

    Lots of new contour funcitonality with line and polygon contours
    provided by contour and contourf.  Automatic inline contour
    labeling with clabel. See
    http://matplotlib.sourceforge.net/screenshots.html#pcolor_demo

 -- QT backend
    Sigve Tjoraand, Ted Drain and colleagues at the JPL collaborated
    on a QTAgg backend

 -- Unicode strings are rendered in the agg and postscript backends.
    Currently, all the symbols in the unicode string have to be in the
    active font file.  In later releases we'll try and support symbols
    from multiple ttf files in one string.  See
    examples/unicode_demo.py

 -- map and projections

    A new release of the basemap toolkit -  See
    http://matplotlib.sourceforge.net/screenshots.html#plotmap

 -- Auto-legends

    The automatic placement of legends is now supported with
    loc='best'; see examples/legend_auto.py.  We did this at the
    matplotlib sprint at pycon -- Thanks John Gill and Phil! Note that
    your legend will move if you interact with your data and you force
    data under the legend line.  If this is not what you want, use a
    designated location code.

 -- Quiver (direction fields)

    Ludovic Aubry contributed a patch for the matlab compatible quiver
    method.  This makes a direction field with arrows.  See
    examples/quiver_demo.py

 -- Performance optimizations

    Substantial optimizations in line marker drawing in agg

 -- Robust log plots

    Lots of work making log plots "just work".  You can toggle log y
    Axes with the 'l' command -- nonpositive data are simply ignored
    and no longer raise exceptions.  log plots should be a lot faster
    and more robust


 -- Many more plotting functions, bugfixes, and features, detailed in
    the 0.71, 0.72, 0.73 and 0.74 point release notes at
    http://matplotlib.sourceforge.net/whats_new.html


http://matplotlib.sourceforge.net

JDH


From simon at arrowtheory.com  Thu Apr 14 23:07:03 2005
From: simon at arrowtheory.com (Simon Burton)
Date: Thu Apr 14 23:07:03 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
Message-ID: <20050415160425.42cb20a6.simon@arrowtheory.com>

Hi,

I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
Is this in the pipeline, or do we go ahead and add the dpotrs based functionality ourselves ? Alternatively, are we able to
convert to and from Numeric (scipy) array's without a memcopy ?

thankyou,

Simon.

-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com 


From arnd.baecker at web.de  Thu Apr 14 23:58:08 2005
From: arnd.baecker at web.de (Arnd Baecker)
Date: Thu Apr 14 23:58:08 2005
Subject: [Numpy-discussion] % and fmod
Message-ID: <Pine.LNX.4.51.0504150845570.3336@ptpcp8.phy.tu-dresden.de>

Dear all,

I encountered the following puzzling behaviour of the modulo operator %:

In [1]: import Numeric
In [2]: print Numeric.__version__
23.8
In [3]: x=Numeric.arange(10.0)
In [4]: print x%4
[ 0.  1.  2.  3.  0.  1.  2.  3.  0.  1.]
In [5]: print 3.0%4
3.0
In [6]: print (-x)%4
[-0. -1. -2. -3. -0. -1. -2. -3. -0. -1.]     # <======
In [7]: print (-3.0)%4                        #    vs.
1.0                                           # <====== (OK)
In [8]: print Numeric.fmod(x,4)
[ 0.  1.  2.  3.  0.  1.  2.  3.  0.  1.]
In [9]: print Numeric.fmod(-x,4)
[-0. -1. -2. -3. -0. -1. -2. -3. -0. -1.]


So it seems that for arrays % behaves like fmod!
This seems in contrast to what one finds in the
python 2.3 documentation:

"5.6. Binary arithmetic operations"

   """The % (modulo) operator yields the remainder from the division
      of the first argument by the second. [...]
      The arguments may be floating point numbers, e.g.,
      3.14%0.7 equals 0.34 (since 3.14 equals 4*0.7 + 0.34.)
      The modulo operator always yields a result with the same sign as
      its second operand (or zero); the absolute value of the result
      is strictly smaller than the absolute value of the second
      operand."""

I am presently teaching a course on computational physics
with python and the students have huge difficulties
with % behaving differently for arrays and scalars.

I am aware that (according to Kernighan/Ritchie) the C standard
does not define the result of % when any of the operands is
negative.

So can someone help me: is the different behaviour of %
for scalars and arrays a bug, a feature,
or what should I tell my students ? ;-).

Many thanks,

Arnd


P.S.: BTW: the documentation for fmod and remainder is
pretty short on this:

In [3]:fmod?
Type:           ufunc
String Form:    <ufunc 'fmod'>
Namespace:      Interactive
Docstring:
    fmod(x,y) is remainder(x,y)

In [4]:remainder?
Type:           ufunc
String Form:    <ufunc 'remainder'>
Namespace:      Interactive
Docstring:
    returns remainder of division elementwise

Are contributions of more detailed doc-strings welcome ?


P.P.S.: for numarray one gets even less information:

In [1]: import numarray
In [2]: numarray.fmod?
Type:           _BinaryUFunc
Base Class:     <class 'numarray.ufunc._BinaryUFunc'>
String Form:    <UFunc: 'remainder'>
Namespace:      Interactive
Docstring:
    Class for ufuncs with 2 input and 1 output arguments

In [3]: numarray.remainder?
Type:           _BinaryUFunc
Base Class:     <class 'numarray.ufunc._BinaryUFunc'>
String Form:    <UFunc: 'remainder'>
Namespace:      Interactive
Docstring:
    Class for ufuncs with 2 input and 1 output arguments
In [4]: print numarray.__version__
1.1.1


P^3.S: scipy's mod seems to be an alternative:
In [1]: import scipy
In [2]: scipy.mod?
Type:           function
Base Class:     <type 'function'>
String Form:    <function mod at 0x40383994>
Namespace:      Interactive
File:
/usr/lib/python2.3/site-packages/scipy_base/function_base.py
Definition:     scipy.mod(x, y)
Docstring:
    x - y*floor(x/y)

    For numeric arrays, x % y has the same sign as x while
    mod(x,y) has the same sign as y.

In [3]: x=-scipy.arange(10)
In [4]: x%4
Out[4]: array([ 0, -1, -2, -3,  0, -1, -2, -3,  0, -1])
In [5]: scipy.mod(x,4)
Out[5]: array([ 0.,  3.,  2.,  1.,  0.,  3.,  2.,  1.,  0.,  3.])
In [6]: scipy.mod??
Type:           function
Base Class:     <type 'function'>
String Form:    <function mod at 0x40383994>
Namespace:      Interactive
File:
/usr/lib/python2.3/site-packages/scipy_base/function_base.py
Definition:     scipy.mod(x, y)
Source:
def mod(x,y):
    """ x - y*floor(x/y)

        For numeric arrays, x % y has the same sign as x while
        mod(x,y) has the same sign as y.
    """
    return x - y*Numeric.floor(x*1.0/y)


From jmiller at stsci.edu  Fri Apr 15 03:46:37 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Fri Apr 15 03:46:37 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <20050415160425.42cb20a6.simon@arrowtheory.com>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
Message-ID: <1113561843.5030.9.camel@jaytmiller.comcast.net>

On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote:
> Hi,
> 
> I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
> Is this in the pipeline, 

No.  Most of the add-on subpackages in numarray, with the exception of
convolve, image, and nd_image, are ports from Numeric.

> or do we go ahead and add the dpotrs based functionality ourselves ?
>
>  Alternatively, are we able to
> convert to and from Numeric (scipy) array's without a memcopy ?

Unless Numeric has been adapted to support the new array interface,  I
think this (converting from numarray to Numeric) has still not been
properly addressed.

Regards,
Todd


From luszczek at cs.utk.edu  Fri Apr 15 07:11:20 2005
From: luszczek at cs.utk.edu (Piotr Luszczek)
Date: Fri Apr 15 07:11:20 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <20050415160425.42cb20a6.simon@arrowtheory.com>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
Message-ID: <425FCAFC.3010603@cs.utk.edu>

Hi all,

the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
apologize if every body knows that).

I'm on the LAPACK team right now and we were wondering if we should
provide bindings for Python. It is almost trivial to do with Pyrex.
But Numeric and numarray already have some functionality in it.
Also, I don't know about popularity of PyLapack.

So my question is if there is a need for the specialized LAPACK
routines. And if so, which API it should use (Numeric, numarray,
Numeric3, scipy_core, standard array, minimum standard array implementation
or array protocol meta info).

Any comments are appreciated,

Piotr Luszczek

Simon Burton wrote:
> Hi,
> 
> I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
> Is this in the pipeline, or do we go ahead and add the dpotrs based functionality ourselves ? Alternatively, are we able to
> convert to and from Numeric (scipy) array's without a memcopy ?
> 
> thankyou,
> 
> Simon.


From perry at stsci.edu  Fri Apr 15 07:21:23 2005
From: perry at stsci.edu (Perry Greenfield)
Date: Fri Apr 15 07:21:23 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <425FCAFC.3010603@cs.utk.edu>
References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu>
Message-ID: <f3843af394ea9bef2b579d557142b78d@stsci.edu>

On Apr 15, 2005, at 10:09 AM, Piotr Luszczek wrote:

> Hi all,
>
> the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
> apologize if every body knows that).
>
> I'm on the LAPACK team right now and we were wondering if we should
> provide bindings for Python. It is almost trivial to do with Pyrex.
> But Numeric and numarray already have some functionality in it.
> Also, I don't know about popularity of PyLapack.
>
> So my question is if there is a need for the specialized LAPACK
> routines. And if so, which API it should use (Numeric, numarray,
> Numeric3, scipy_core, standard array, minimum standard array 
> implementation
> or array protocol meta info).
>
> Any comments are appreciated,
>
> Piotr Luszczek
>

If you don't need anything unusual, using the Numeric C-API should be 
safe. There is the intent to preserve backward compatibility for that 
in numarray and Numeric3 for the most part (numarray's ufunc api is 
different however, but it isn't clear you need to use that). Numeric3 
and numarray will/do have other capabilities not part of the Numeric 
api, but again, I suspect that for a first version, one can probably 
avoid needing those. I'd also like to hear what Travis thinks about 
this.

Perry Greenfield


From pjssilva at ime.usp.br  Fri Apr 15 08:00:44 2005
From: pjssilva at ime.usp.br (Paulo J. S. Silva)
Date: Fri Apr 15 08:00:44 2005
Subject: [Numpy-discussion] Pycoin - Python interface to COIN/CLP Linear Programming solver
Message-ID: <1113577115.9013.9.camel@localhost.localdomain>

Hello,

I am finally releasing the code I have to interface COIN/CLP linear
programming solver with Python/Numarray.

You can download the code at:

http://www.ime.usp.br/~pjssilva/pycoin/index.html

In the page you can see sample client code.

The interface is very simple, consisting mostly of swing interfaces
files, but it is very useful to me. It also can be used as an example on
how to interface C++ and Python/Numarray using swig. 

I plan to make this interface grow to something much better, with an
interface to full Clp, another to OsiClp (only this one is available
right now) and maybe other COIN optimization libraries like IPOPT.

Please, download, use, test, comment. 

Best,

Paulo
-- 
Paulo Jos? da Silva e Silva 
Professor Assistente do Dep. de Ci?ncia da Computa??o
(Assistant Professor of the Computer Science Dept.)
Universidade de S?o Paulo - Brazil

e-mail: pjssilva at ime.usp.br        Web: http://www.ime.usp.br/~pjssilva

Teoria ? o que n?o entendemos o    (Theory is something we don't)
suficiente para chamar de pr?tica. (understand well enough to call) 
                                   (practice)


From cookedm at physics.mcmaster.ca  Fri Apr 15 10:48:55 2005
From: cookedm at physics.mcmaster.ca (David M. Cooke)
Date: Fri Apr 15 10:48:55 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <425FCAFC.3010603@cs.utk.edu> (Piotr Luszczek's message of
 "Fri, 15 Apr 2005 10:09:00 -0400")
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
	<425FCAFC.3010603@cs.utk.edu>
Message-ID: <qnkd5swvsn1.fsf@arbutus.physics.mcmaster.ca>

Piotr Luszczek <luszczek at cs.utk.edu> writes:

> Hi all,
>
> the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
> apologize if every body knows that).
>
> I'm on the LAPACK team right now and we were wondering if we should
> provide bindings for Python. It is almost trivial to do with Pyrex.
> But Numeric and numarray already have some functionality in it.
> Also, I don't know about popularity of PyLapack.
>
> So my question is if there is a need for the specialized LAPACK
> routines. And if so, which API it should use (Numeric, numarray,
> Numeric3, scipy_core, standard array, minimum standard array implementation
> or array protocol meta info).

You'll probably first want to look at scipy, which already wraps (all?
most?) of LAPACK in its scipy.linalg package (including dpotrs :-)

It uses f2py to make the process much easier.


Since you mention you're on the LAPACK team ...

I've been working on redoing the f2c'd LAPACK wrappers in Numeric,
updating them to the current version...except: what *is* the current
version? The patches on netlib are 2-3 years old, and you have to grab
them separately, file-by-file (can I say how insanely stupid that
is?). Also ... they break: with some test cases (derived from ones
posted to our bug tracker) some routines segfault.

Is it the LAPACK 3e? If that's the case, we can't use it unless there
are C versions (Numeric only requires Python and a C compiler;
throwing a F90 compiler in there is *not* an option -- we don't even
require a F77 compiler).

I ended up using the source from Debian unstable from the lapack3
package, and those work fine.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca


From haase at msg.ucsf.edu  Fri Apr 15 12:38:51 2005
From: haase at msg.ucsf.edu (Sebastian Haase)
Date: Fri Apr 15 12:38:51 2005
Subject: [Numpy-discussion] Why does nd_image require writable input array ?
Message-ID: <200504151235.48573.haase@msg.ucsf.edu>

Hi,
I'm using memmap to read my MRC-imagedata files. 
I just thought this might be a case of general interest - see below:

>>> s = U.nd.boxcar_filter(Y.vd(1), size=3, output=None, mode="nearest", 
cval=0.0, origin=0, output_type=None)
Traceback (most recent call last):
  File "<input>", line 1, in ?
  File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 314, in 
boxcar_filter
    cval = cval, output_type = output_type)
  File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 261, in 
boxcar_filter1d
    cval, origin, _ni_support._type_to_num[output_type])
TypeError: NA_IoArray: I/O numarray must be writable NumArrays.
>>> na.__version__
'1.2.3'
>>> 

 
Thanks,
Sebastian Haase


From verveer at embl.de  Fri Apr 15 12:55:33 2005
From: verveer at embl.de (Peter Verveer)
Date: Fri Apr 15 12:55:33 2005
Subject: [Numpy-discussion] Why does nd_image require writable input array ?
In-Reply-To: <200504151235.48573.haase@msg.ucsf.edu>
References: <200504151235.48573.haase@msg.ucsf.edu>
Message-ID: <9396f2dea14c14fb7a6bd04f6077c448@embl.de>

You may have run in an older bug which I fixed. Please try upgrading to 
the new numarray 1.3 and see if the problem disappears. If not let me 
know. Note: the function you are using (boxcar_filter) has been renamed 
in 1.3 to uniform_filter (to be more in line with common image 
processing terminology.)

Cheers, Peter

On Apr 15, 2005, at 9:35 PM, Sebastian Haase wrote:

> Hi,
> I'm using memmap to read my MRC-imagedata files.
> I just thought this might be a case of general interest - see below:
>
>>>> s = U.nd.boxcar_filter(Y.vd(1), size=3, output=None, mode="nearest",
> cval=0.0, origin=0, output_type=None)
> Traceback (most recent call last):
>   File "<input>", line 1, in ?
>   File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 314, in
> boxcar_filter
>     cval = cval, output_type = output_type)
>   File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 261, in
> boxcar_filter1d
>     cval, origin, _ni_support._type_to_num[output_type])
> TypeError: NA_IoArray: I/O numarray must be writable NumArrays.
>>>> na.__version__
> '1.2.3'
>>>>
>
>
> Thanks,
> Sebastian Haase


From luszczek at cs.utk.edu  Fri Apr 15 20:41:05 2005
From: luszczek at cs.utk.edu (Piotr Luszczek)
Date: Fri Apr 15 20:41:05 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <qnkd5swvsn1.fsf@arbutus.physics.mcmaster.ca>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>	<425FCAFC.3010603@cs.utk.edu> <qnkd5swvsn1.fsf@arbutus.physics.mcmaster.ca>
Message-ID: <426088F5.90602@cs.utk.edu>

David M. Cooke wrote:
> Piotr Luszczek <luszczek at cs.utk.edu> writes:
> 
> 
>>Hi all,
>>
>>the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
>>apologize if every body knows that).
>>
>>I'm on the LAPACK team right now and we were wondering if we should
>>provide bindings for Python. It is almost trivial to do with Pyrex.
>>But Numeric and numarray already have some functionality in it.
>>Also, I don't know about popularity of PyLapack.
>>
>>So my question is if there is a need for the specialized LAPACK
>>routines. And if so, which API it should use (Numeric, numarray,
>>Numeric3, scipy_core, standard array, minimum standard array implementation
>>or array protocol meta info).
> 
> 
> You'll probably first want to look at scipy, which already wraps (all?
> most?) of LAPACK in its scipy.linalg package (including dpotrs :-)

It seems to have almost all routines.

> It uses f2py to make the process much easier.
> 
> 
> Since you mention you're on the LAPACK team ...
> 
> I've been working on redoing the f2c'd LAPACK wrappers in Numeric,
> updating them to the current version...except: what *is* the current

Current version is 3.0.

> version? The patches on netlib are 2-3 years old, and you have to grab

After funding ran out there were only volunteers left.
It's hard to get free open-source developers these days.

> them separately, file-by-file (can I say how insanely stupid that

Frankly, I had the same comment when I first saw it.
Hopefully, next update will straighten things out.

> is?). Also ... they break: with some test cases (derived from ones
> posted to our bug tracker) some routines segfault.

Yes I know. We have postings about it on the mailing list almost
weekly.

> Is it the LAPACK 3e? If that's the case, we can't use it unless there

LAPACK 3E is only somewhat related to LAPACK.
But it's not "current version".

> are C versions (Numeric only requires Python and a C compiler;
> throwing a F90 compiler in there is *not* an option -- we don't even
> require a F77 compiler).

We've been thinking about languages for a while. CLAPACK user base
is too strong to ignore. So we think of keeping F77 as the base language.
Or maybe we should do f90toC. f2c and f2j are on Netlib already and
f2py has some F90 support.

> I ended up using the source from Debian unstable from the lapack3
> package, and those work fine.

Again, it's hard to get grant money for support.

Thanks for the comments.

Piotr


From pearu at cens.ioc.ee  Fri Apr 15 23:09:01 2005
From: pearu at cens.ioc.ee (pearu at cens.ioc.ee)
Date: Fri Apr 15 23:09:01 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <426088F5.90602@cs.utk.edu>
Message-ID: <Pine.LNX.4.21.0504160848060.22957-100000@cens.kybi>


On Fri, 15 Apr 2005, Piotr Luszczek wrote:

> > You'll probably first want to look at scipy, which already wraps (all?
> > most?) of LAPACK in its scipy.linalg package (including dpotrs :-)
> 
> It seems to have almost all routines.

You should look at scipy.lib.lapack package that has more wrappers than in
scipy.linalg and it will be used in scipy.linalg in future.
scipy.lib.lapack certainly does not wrap all of LAPACK but adding new
wrappers is easy and is done on demand basis. What's wrapped and what's
not in scipy.lib.lapack is well documented in the headers of .pyf.src
files. My current plan is to add CLAPACK sources to scipy.lib.lapack so
that it could be included to Numeric3 project because it has a requirement
that everything should compile having only C compiler available.

> We've been thinking about languages for a while. CLAPACK user base
> is too strong to ignore. So we think of keeping F77 as the base language.
> Or maybe we should do f90toC. f2c and f2j are on Netlib already and
> f2py has some F90 support.

f2py will have limited support for F90 derived types as soon as I get a
chance to review Jeffrey Hagelberg patches on this. However, keeping F77
as the base language is a good idea, imho, free F90 compilers are still
rare these days.

Pearu


From florian.proff.schulze at gmx.net  Sat Apr 16 03:25:37 2005
From: florian.proff.schulze at gmx.net (Florian Schulze)
Date: Sat Apr 16 03:25:37 2005
Subject: [Numpy-discussion] bytes object info
Message-ID: <opspblrrbqttxc4i@news.gmane.org>

Hi!

I just discovered this:
http://members.dsl-only.net/~daniels/Block.html

I didn't try it out, but maybe it's helpful to you.

Regards,
Florian Schulze


From cjw at sympatico.ca  Sat Apr 16 11:29:01 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Sat Apr 16 11:29:01 2005
Subject: [Numpy-discussion] bytes object info
In-Reply-To: <opspblrrbqttxc4i@news.gmane.org>
References: <opspblrrbqttxc4i@news.gmane.org>
Message-ID: <426158FD.8060507@sympatico.ca>

Florian Schulze wrote:

> Hi!
>
> I just discovered this:
> http://members.dsl-only.net/~daniels/Block.html

Ugh!  Letter codes to identify data types - I thought that we had moved 
beyond that. ;-)

Colin W.

>
> I didn't try it out, but maybe it's helpful to you.
>
> Regards,
> Florian Schulze
>
>


From oliphant at ee.byu.edu  Sat Apr 16 21:16:07 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Sat Apr 16 21:16:07 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <425FCAFC.3010603@cs.utk.edu>
References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu>
Message-ID: <4261E2A5.1060109@ee.byu.edu>

Piotr Luszczek wrote:

> Hi all,
>
> the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I
> apologize if every body knows that).
>
> I'm on the LAPACK team right now and we were wondering if we should
> provide bindings for Python. It is almost trivial to do with Pyrex.
> But Numeric and numarray already have some functionality in it.
> Also, I don't know about popularity of PyLapack.

Scipy already has extensive bindings for LAPACK.    There is even a lot 
of development that has been done for c-compiled bindings.

Right now, scipy_core is being developed to be a single replacement for 
Numeric/numarray.   Lapack bindings are a huge part of that effort.  
But, as I said, the work has been done (using f2py).  The biggest issue 
is supporting f2c'd versions of Lapack so that folks without Fortran 
compilers can still install it.    scipy_core will allow this.  Again, 
most of the effort is accomplished through f2py and scipy_distutils 
which are really good tools. 

Pyrex is nice, but f2py is really, really nice (it even supports 
wrapping basic c-code).

>
> So my question is if there is a need for the specialized LAPACK
> routines. And if so, which API it should use (Numeric, numarray,
> Numeric3, scipy_core, standard array, minimum standard array 
> implementation
> or array protocol meta info).

I think if LAPACK were going to go through the trouble, it would be best 
for LAPACK to provide "array protocol" style wrappers.   That way any 
Python array user could take advantage of them.  

While current scipy users and future scipy_core users do not need 
LAPACK-provided Python wrappers, we would welcome any native support by 
the LAPACK team.   Again, though, I think this should be done through 
the array_protocol API.   A C-API is likely in the near future as well 
(which will provide a little speed up for many small arrays).

-Travis


-Travis


From simon at arrowtheory.com  Sun Apr 17 20:44:16 2005
From: simon at arrowtheory.com (Simon Burton)
Date: Sun Apr 17 20:44:16 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <1113561843.5030.9.camel@jaytmiller.comcast.net>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
	<1113561843.5030.9.camel@jaytmiller.comcast.net>
Message-ID: <20050418134337.1b3f8ae8.simon@arrowtheory.com>

On Fri, 15 Apr 2005 06:44:02 -0400
Todd Miller <jmiller at stsci.edu> wrote:

> On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote:
> > Hi,
> > 
> > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
> > Is this in the pipeline, 
> 
> No.  Most of the add-on subpackages in numarray, with the exception of
> convolve, image, and nd_image, are ports from Numeric.
> 

Ok, thanks Todd; we will have a go at porting this solver then. If you have any more advice on how to get started with this
that would be much appreciated.

Simon.


-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com 


From arnd.baecker at web.de  Mon Apr 18 00:30:10 2005
From: arnd.baecker at web.de (Arnd Baecker)
Date: Mon Apr 18 00:30:10 2005
Subject: [Numpy-discussion] scipy.base - % and fmod  segfault
Message-ID: <Pine.LNX.4.51.0504180919540.20690@ptpcp8.phy.tu-dresden.de>

Hi (in particular Travis),

concerning my recent question on % on fmod for Numeric and numarray
I was curious to see how scipy.base behaves.
With a CVS check-out this morning I get:

In [1]: from scipy.base import *
In [2]: x=arange(10)
In [3]: print x%4
array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1], 'l')
In [4]: print (-x)%4
zsh: 12391 segmentation fault  ipython

(The same holds for fmod, and also for x=arange(10.0) ).

Personally I would prefer if in the end % behaves the
same way for arrays as for scalars.

Do you think that this is possible with scipy.base?

Best,

Arnd

P.S.: I haven't tested much more of scipy.base this time
(but the few things concerning array operations I looked at,
seem fine. Ah there is one: Doing
  import scipy.base
  scipy.base.fmod?
in ipython gives a segmentation fault
(the same with .sin, .exp etc. ...)
)


From jmiller at stsci.edu  Mon Apr 18 06:38:21 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Mon Apr 18 06:38:21 2005
Subject: [Numpy-discussion] numarray cholesky solver ?
In-Reply-To: <20050418134337.1b3f8ae8.simon@arrowtheory.com>
References: <20050415160425.42cb20a6.simon@arrowtheory.com>
	 <1113561843.5030.9.camel@jaytmiller.comcast.net>
	 <20050418134337.1b3f8ae8.simon@arrowtheory.com>
Message-ID: <1113831328.29165.30.camel@halloween.stsci.edu>

On Sun, 2005-04-17 at 23:43, Simon Burton wrote:
> On Fri, 15 Apr 2005 06:44:02 -0400
> Todd Miller <jmiller at stsci.edu> wrote:
> 
> > On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote:
> > > Hi,
> > > 
> > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver.
> > > Is this in the pipeline, 
> > 
> > No.  Most of the add-on subpackages in numarray, with the exception of
> > convolve, image, and nd_image, are ports from Numeric.
> > 
> 
> Ok, thanks Todd; we will have a go at porting this solver then. If you have any more advice on how to get started with this
> that would be much appreciated.

If you're doing a port of something that already works for Numeric
chances are good that numarray's Numeric compatibility API will make
things "just work."  In any case,  be sure to use the compatibility API
since it's the easiest path forward to Numeric3 should that effort prove
successful (which I think it will).

Usually what's involved in porting from Numeric to numarray is just
making sure that the numarray files can be used rather than the Numeric
header files.  I think the style we used for matplotlib,  while not
fully general,  is the simplest and best compromise:

#ifdef NUMARRAY
#include "numarray/arrayobject.h"
#else
#include "Numeric/arrayobject.h"
#endif

In setup.py,  you have to pass extra_compile_args=["-DNUMARRAY=1"] or
similar to the Extension() constructions to build for numarray.  There
are more details we could discuss if you want to build for both Numeric
and numarray simultaneously.

Two limitations of the numarray Numeric compatible C-API are: (1) a
partially compatible array descriptor structure (PyArray_Descr) and (2)
the UFunc C-API.  Generally,  neither of those is an issue,  but for
large projects (e.g. scipy) they matter. 

Good luck porting.  Feel free to ask questions either on the list or
privately if you run into trouble.

Regards,
Todd


From haase at msg.ucsf.edu  Mon Apr 18 09:16:15 2005
From: haase at msg.ucsf.edu (Sebastian Haase)
Date: Mon Apr 18 09:16:15 2005
Subject: [Numpy-discussion] bytes object info
In-Reply-To: <opspblrrbqttxc4i@news.gmane.org>
References: <opspblrrbqttxc4i@news.gmane.org>
Message-ID: <200504180914.33383.haase@msg.ucsf.edu>

Hey,
this _really_ is no SPAM ... ;-)
(Maybe different wording next time)
Thanks,
Sebastian Haase

On Saturday 16 April 2005 03:22, Florian Schulze wrote:
> Hi!
>
> I just discovered this:
> http://members.dsl-only.net/~daniels/Block.html
>
> I didn't try it out, but maybe it's helpful to you.
>
> Regards,
> Florian Schulze
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
 

From oliphant at ee.byu.edu  Mon Apr 18 17:09:49 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Mon Apr 18 17:09:49 2005
Subject: [Numpy-discussion] Numeric 24.0
Message-ID: <42644B7C.9030907@ee.byu.edu>

I am going to release Numeric 24.0 today or tomorrow unless I hear from 
anybody about some changes that need to get made.

-Travis


From faltet at carabos.com  Tue Apr 19 03:05:27 2005
From: faltet at carabos.com (Francesc Altet)
Date: Tue Apr 19 03:05:27 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <42644B7C.9030907@ee.byu.edu>
References: <42644B7C.9030907@ee.byu.edu>
Message-ID: <200504191202.52097.faltet@carabos.com>

Hi,

I was curious about the newly introduced array protocol in Numeric
24.0 (as well as in current numarray CVS), and wanted to check if
there is better speed during Numeric <-> numarray objects conversion.
There answer is "partially" affirmative:

>>> import numarray
>>> import Numeric
>>> print numarray.__version__
1.4.0
>>> print Numeric.__version__
24.0
>>> from time import time
>>> a = numarray.arange(100*1000)
>>> t1=time();b=Numeric.array(a);time()-t1  # numarray --> Numeric
0.0021419525146484375  # It was 1.58109998703 with Numeric 23.8 !

So, numarray --> Numeric speed has been improved quite a lot

On the other way round, Numeric to numarray is not as efficient:

>>> Na = Numeric.arange(100*1000)
>>> t1=time();c=numarray.array(Na);time()-t1 # Numeric --> numarray
0.15217900276184082    # It is much slower than numarray --> Numeric

I guess that the numarray --> Numeric can be speed-up because:

>>> 
t1=time();Nb=numarray.array(buffer(Na),typecode=Na.typecode(),shape=Na.shape);time()-t1
0.00017499923706054688  # Numeric --> numarray using the buffer protocol

So, I guess CVS numarray is still refining the array protocol.

But the thing that mostly shocks me is why the array protocol is still
allowing doing conversions with memory copies because, as you can see
in the last example that uses a buffer protocol, a non-copy memory
conversion is indeed possible for numarray --> Numeric.

So the question is: Would the array protocol bring numarray <->
Numeric <-> Numeric3 conversions without memory copies or this is more
a wish on my half than an actual possibility?

Thanks and keep the nice work!

-- 
>0,0<   Francesc Altet ? ? http://www.carabos.com/
V   V   C?rabos Coop. V. ??Enjoy Data
 "-"


From eric at enthought.com  Tue Apr 19 22:48:17 2005
From: eric at enthought.com (eric jones)
Date: Tue Apr 19 22:48:17 2005
Subject: [Numpy-discussion] job openings at Enthought
Message-ID: <4265ECEF.6050004@enthought.com>

Hey group,

We have a number of scientific/python related jobs open.  If you have 
any interest, please see:

    http://www.enthought.com/careers.htm

thanks,
eric


From cjw at sympatico.ca  Wed Apr 20 00:45:21 2005
From: cjw at sympatico.ca (Colin J. Williams)
Date: Wed Apr 20 00:45:21 2005
Subject: [Numpy-discussion] Installing Numeric3 using the Borland Compiler
Message-ID: <42660855.4090600@sympatico.ca>

I have tried:

    python setup.py install build_ext --compiler=bcpp

It seems that the distutils call uses scipy.distutils, rather than the 
standard, and that the scipy version is based on an older version of 
distutils.

Is there some way to work around this?

Colin W.


From pearu at cens.ioc.ee  Wed Apr 20 12:00:34 2005
From: pearu at cens.ioc.ee (pearu at cens.ioc.ee)
Date: Wed Apr 20 12:00:34 2005
Subject: [Numpy-discussion] Installing Numeric3 using the Borland Compiler
In-Reply-To: <42660855.4090600@sympatico.ca>
Message-ID: <Pine.LNX.4.21.0504202156100.22701-100000@cens.kybi>


On Wed, 20 Apr 2005, Colin J. Williams wrote:

> I have tried:
> 
>     python setup.py install build_ext --compiler=bcpp
> 
> It seems that the distutils call uses scipy.distutils, rather than the 
> standard, and that the scipy version is based on an older version of 
> distutils.
> 
> Is there some way to work around this?

So, what problems exactly to you experience with the above command? Using
scipy.distutils should not be much different compared to std distutils
when building std extension modules.

Pearu


From oliphant at ee.byu.edu  Wed Apr 20 12:05:30 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed Apr 20 12:05:30 2005
Subject: [Numpy-discussion] Numeric 24.0
Message-ID: <4266A7AD.5090600@ee.byu.edu>

I've released Numeric 24.0 as a beta (2nd version) release.  Right now 
it's just a tar file.

Please find any bugs.  I'll wait a week or two and release a final 
version unless I hear reports of problems.

Thanks to those who have found bugs already.

David Cooke has been especially active in helping fix problems.   Many 
kudos to him.

-Travis


From jmiller at stsci.edu  Thu Apr 21 08:12:30 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr 21 08:12:30 2005
Subject: [Numpy-discussion] ANN: numarray-1.3.1
Message-ID: <1114096238.4446.18.camel@jaytmiller.comcast.net>

Release Notes for numarray-1.3.1

Numarray is an array processing package designed to efficiently
manipulate large multi-dimensional arrays.  Numarray is modelled after
Numeric and features c-code generated from python template scripts,
the capacity to operate directly on arrays in files, arrays of
heterogeneous records, string arrays, and in-place operation on
memory mapped files.

I. ENHANCEMENTS

None.   1.3.1 fixes the problem with gcc-3.4.3

II. BUGS FIXED / CLOSED

1152323 /usr/include/fenv.h:96: error: conflicting types for 'fegete
1185024 numarray-1.2.3 fails to compile with gcc-3.4.3
1187162 Numarray 1.3.0 installation failure

See
http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse
for more details.


From oliphant at ee.byu.edu  Fri Apr 22 03:51:14 2005
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Fri Apr 22 03:51:14 2005
Subject: [Numpy-discussion] Numeric 24.0
In-Reply-To: <yfsoec7dbmd.fsf@black4.ex.ac.uk>
References: <4266A7AD.5090600@ee.byu.edu> <yfsoec7dbmd.fsf@black4.ex.ac.uk>
Message-ID: <4268D6BD.9000100@ee.byu.edu>

Alexander Schmolck wrote:

>Travis Oliphant <oliphant at ee.byu.edu> writes:
>
>  
>
>>I've released Numeric 24.0 as a beta (2nd version) release.  Right now it's
>>just a tar file.
>>
>>Please find any bugs.  I'll wait a week or two and release a final version
>>unless I hear reports of problems.
>>    
>>
>
>
>I suspect some other problems I haven't tried to track down yet are due to
>this:
>
>    >>> a = num.array([[1],[2],[3]])
>    >>> ~(a==a)
>    array([[-2],
>           [-2],
>           [-2]])
>  
>
What is wrong with this?  ~ is bit-wise not and gives the correct 
answer, here.

>
>Object array comparisons still produce haphazard behaviour:
>
>    >>> a = num.array(["ab", "cd", "efg"], 'O')
>    >>> a == 'ab'
>    0
>  
>
You are mixing Object arrays and character arrays here and expecting too 
much.    String arrays in Numeric and their relationship with object 
arrays have never been too useful.    You need to be explicit  about how 
'ab' is going to be interpreted and do

a == array('ab','O')  to get what you were probably expecting. 

>Finally -- not necessarily a bug, but a change of behaviour that seems undocumented (I'm
>pretty sure this used to give a float array as return value):
>
>    >>> num.zeros((2.0,))
>    *** TypeError: an integer is required
>
>  
>
>'as
>  
>
I don't think this worked as you think it did (I looked at Numeric 21.3). 

num.zeros(2.0)  works  but it shouldn't.  This is a bug that I'll fix.

Shapes should be integers, not floats.  If this was not checked before 
than that was a bug.   It looks like it's always been checked 
differently for single-element tuples and scalars


So, in short,  I see only one small bug here.  Thanks for testing things 
out.

-Travis


From stephen.walton at csun.edu  Mon Apr 25 11:50:28 2005
From: stephen.walton at csun.edu (Stephen Walton)
Date: Mon Apr 25 11:50:28 2005
Subject: [Numpy-discussion] Value selections?
Message-ID: <426D3BA8.6020500@csun.edu>

I'm trying out Numeric 24b2.  In numarray, the following code will plot 
the values of an array which are not equal to 'flag':

f = array!=flag
plot(array[f])

What is the equivalent in Numeric 24b2?


From rkern at ucsd.edu  Mon Apr 25 11:59:03 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Mon Apr 25 11:59:03 2005
Subject: [Numpy-discussion] Value selections?
In-Reply-To: <426D3BA8.6020500@csun.edu>
References: <426D3BA8.6020500@csun.edu>
Message-ID: <426D3D4C.5070302@ucsd.edu>

Stephen Walton wrote:
> I'm trying out Numeric 24b2.  In numarray, the following code will plot 
> the values of an array which are not equal to 'flag':
> 
> f = array!=flag
> plot(array[f])
> 
> What is the equivalent in Numeric 24b2?

compress(f, array) is the lowest common denominator. I'm not sure if 
Numeric 24 gets fancier like numarray.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From confirm-s2-anNSKqzsyA7slXUGUdYHvlkpsPI-numpy-discussion=lists.sourceforge.net at yahoogroups.com  Tue Apr 26 03:10:12 2005
From: confirm-s2-anNSKqzsyA7slXUGUdYHvlkpsPI-numpy-discussion=lists.sourceforge.net at yahoogroups.com (Yahoo! Groups)
Date: Tue Apr 26 03:10:12 2005
Subject: [Numpy-discussion] Please confirm your request to join IErussian
Message-ID: <1114509872.69.19665.m18@yahoogroups.com>


Hello numpy-discussion at lists.sourceforge.net,

We have received your request to join the IErussian 
group hosted by Yahoo! Groups, a free, easy-to-use community service.

This request will expire in 7 days.

TO BECOME A MEMBER OF THE GROUP: 

1) Go to the Yahoo! Groups site by clicking on this link:
   http://groups.yahoo.com/i?i=anNSKqzsyA7slXUGUdYHvlkpsPI&e=numpy-discussion%40lists%2Esourceforge%2Enet 

  (If clicking doesn't work, "Cut" and "Paste" the line above into your 
   Web browser's address bar.)

-OR-

2) REPLY to this email by clicking "Reply" and then "Send"
   in your email program

If you did not request, or do not want, a membership in the
IErussian group, please accept our apologies
and ignore this message.

Regards,

Yahoo! Groups Customer Care

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

 
From jswhit at fastmail.fm  Tue Apr 26 07:58:36 2005
From: jswhit at fastmail.fm (Jeff Whitaker)
Date: Tue Apr 26 07:58:36 2005
Subject: [Numpy-discussion] numarray problems on AIX
Message-ID: <426E5637.1080305@fastmail.fm>

Hi:

I'm having problems with numarray 1.3.1/Python 2.4.1 on AIX 5.2:

Python 2.4.1 (#3, Apr 26 2005, 10:34:56) [C] on aix5
Type "help", "copyright", "credits" or "license" for more information.
 >>> import numarray
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/__init__.py", 
line 42, in ?
    from numarrayall import *
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/numarrayall.py", 
line 2, in ?
    from generic import *
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/generic.py", 
line 1116, in ?
    import numarraycore as _nc
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/numarraycore.py", 
line 1751, in ?
    import ufunc
  File 
"/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/ufunc.py", 
line 13, in ?
    import _converter
ImportError: dynamic module does not define init function (init_converter)

it works with AIX 4 - anyone seen this before?

-Jeff

-- 
Jeffrey S. Whitaker         Phone  : (303)497-6313
Meteorologist               FAX    : (303)497-6449
NOAA/OAR/CDC  R/CDC1        Email  : Jeffrey.S.Whitaker at noaa.gov
325 Broadway                Office : Skaggs Research Cntr 1D-124
Boulder, CO, USA 80303-3328 Web    : http://tinyurl.com/5telg


From faltet at carabos.com  Tue Apr 26 10:45:02 2005
From: faltet at carabos.com (Francesc Altet)
Date: Tue Apr 26 10:45:02 2005
Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms
Message-ID: <200504261942.46011.faltet@carabos.com>

Hi,

I'm having problems converting numarray objects into Numeric in 64-bit
platforms, and I think this is numarray fault, but I'm not completely
sure. 

The problem can be easily visualized in an example (I'm using numarray
1.3.1 and Numeric 24.0b2). In a 32-bit platform (Intel32, Linux):

>>> Num=Numeric.array((3,),typecode='l')
>>> na=numarray.array(Num,typecode=Num.typecode())
>>> Numeric.array(na,typecode=na.typecode())
array([3],'i')    # The conversion has finished correctly

In 64-bit platforms (AMD64, Linux):

>>> Num=Numeric.array((3,),typecode='l')
>>> na=numarray.array(Num,typecode=Num.typecode())
>>> Numeric.array(na,typecode=na.typecode())
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: typecode argument must be a valid type.

The problem is that, for 32-bit platforms, na.typecode() == 'i' as it
should be, but for 64-bit platforms na.typecode() == 'N' that is not a
valid type in Numeric. I guess that na.typecode() should be mapped to
'l' in 64-bit platforms so that Numeric can recognize the Int64
correctly.

Any suggestion?

-- 
>0,0<   Francesc Altet ? ? http://www.carabos.com/
V   V   C?rabos Coop. V. ??Enjoy Data
 "-"


From jmiller at stsci.edu  Tue Apr 26 13:57:14 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Tue Apr 26 13:57:14 2005
Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms
In-Reply-To: <200504261942.46011.faltet@carabos.com>
References: <200504261942.46011.faltet@carabos.com>
Message-ID: <1114548937.24120.97.camel@halloween.stsci.edu>

On Tue, 2005-04-26 at 13:42, Francesc Altet wrote:
> Hi,
> 
> I'm having problems converting numarray objects into Numeric in 64-bit
> platforms, and I think this is numarray fault, but I'm not completely
> sure. 
> 
> The problem can be easily visualized in an example (I'm using numarray
> 1.3.1 and Numeric 24.0b2). In a 32-bit platform (Intel32, Linux):
> 
> >>> Num=Numeric.array((3,),typecode='l')
> >>> na=numarray.array(Num,typecode=Num.typecode())
> >>> Numeric.array(na,typecode=na.typecode())
> array([3],'i')    # The conversion has finished correctly
> 
> In 64-bit platforms (AMD64, Linux):
> 
> >>> Num=Numeric.array((3,),typecode='l')
> >>> na=numarray.array(Num,typecode=Num.typecode())
> >>> Numeric.array(na,typecode=na.typecode())
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: typecode argument must be a valid type.
> 
> The problem is that, for 32-bit platforms, na.typecode() == 'i' as it
> should be, but for 64-bit platforms na.typecode() == 'N' that is not a
> valid type in Numeric. I guess that na.typecode() should be mapped to
> 'l' in 64-bit platforms so that Numeric can recognize the Int64
> correctly.
> 
> Any suggestion?

I agree that since the typecode() method exists for backward
compatibility,  returning 'N' rather than 'l' on an LP64 platform can be
considered a bug.   However,  there are two problems I see:

1. Returning 'l' doesn't handle the case of converting a numarray Int64
array on a 32-bit platform.   AFIK, there is no typecode that will work
for that case.  So,  we're only getting a partial solution.

2. numarray uses typecodes internally to encode type signatures.  There,
platform-independent typecodes are useful and making this change will
add confusion.

I think we may be butting up against the absolute/relative type
definition problem.  Comments?

Todd


From faltet at carabos.com  Wed Apr 27 05:40:35 2005
From: faltet at carabos.com (Francesc Altet)
Date: Wed Apr 27 05:40:35 2005
Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms
In-Reply-To: <1114548937.24120.97.camel@halloween.stsci.edu>
References: <200504261942.46011.faltet@carabos.com> <1114548937.24120.97.camel@halloween.stsci.edu>
Message-ID: <200504271432.46852.faltet@carabos.com>

A Dimarts 26 Abril 2005 22:55, Todd Miller va escriure:
> > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it
> > should be, but for 64-bit platforms na.typecode() == 'N' that is not a
> > valid type in Numeric. I guess that na.typecode() should be mapped to
> > 'l' in 64-bit platforms so that Numeric can recognize the Int64
> > correctly.
>
> I agree that since the typecode() method exists for backward
> compatibility,  returning 'N' rather than 'l' on an LP64 platform can be
> considered a bug.   However,  there are two problems I see:
>
> 1. Returning 'l' doesn't handle the case of converting a numarray Int64
> array on a 32-bit platform.   AFIK, there is no typecode that will work
> for that case.  So,  we're only getting a partial solution.

One can always do a separate case for 64-bit platforms. This solution
is already used in Lib/numerictypes.py

> 2. numarray uses typecodes internally to encode type signatures.  There,
> platform-independent typecodes are useful and making this change will
> add confusion.

Well, this is the root of the problem for 'l' (long int) types, that
their meaning depends on the platform.

Anyway, I've tried with the next patch, and everything seems to work
well (i.e. it's done what it is itended):

--------------------------------------------------------------
--- Lib/numerictypes.py         Wed Apr 27 07:13:08 2005
+++ Lib/numerictypes.py.modif   Wed Apr 27 07:21:48 2005
@@ -389,7 +389,11 @@
 # at code generation / installation time.
 from codegenerator.ufunccode import typecode
 for tname, tcode in typecode.items():
-    typecode[ eval(tname)] = tcode
+    if tname == "Int64" and numinclude.LP64:
+        typecode[ eval(tname)] = 'l'
+    else:
+        typecode[ eval(tname)] = tcode
+

 if numinclude.hasUInt64:
     _MaximumType = {
---------------------------------------------------------------

With that, we have on 64-bit platforms:

>>> import Numeric
>>> Num=Numeric.array((3,),typecode='l')
>>> import numarray
>>> na=numarray.array(Num,typecode=Num.typecode())
>>> Numeric.array(na,typecode=na.typecode())
array([3])
>>> Numeric.array(na,typecode=na.typecode()).typecode()
'l'

and on 32-bit:

>>> Num=Numeric.array((3,),typecode='l')
>>> na=numarray.array(Num,typecode=Num.typecode())
>>> Numeric.array(na,typecode=na.typecode())
array([3],'i')
>>> Numeric.array(na,typecode=na.typecode()).typecode()
'i'

Which should be the correct behaviour.

> I think we may be butting up against the absolute/relative type
> definition problem.  Comments?

That may add some confusion, but if we want to be consistent with the
'l' (long int) meaning for different platforms, I think the suggested
patch (or other more elegant) is the way to go, IMHO.

Cheers,

-- 
>0,0<   Francesc Altet ? ? http://www.carabos.com/
V   V   C?rabos Coop. V. ??Enjoy Data
 "-"


From jmiller at stsci.edu  Wed Apr 27 08:36:09 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Wed Apr 27 08:36:09 2005
Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms
In-Reply-To: <200504271432.46852.faltet@carabos.com>
References: <200504261942.46011.faltet@carabos.com>
	 <1114548937.24120.97.camel@halloween.stsci.edu>
	 <200504271432.46852.faltet@carabos.com>
Message-ID: <1114615773.28309.95.camel@halloween.stsci.edu>

On Wed, 2005-04-27 at 08:32, Francesc Altet wrote:
> A Dimarts 26 Abril 2005 22:55, Todd Miller va escriure:
> > > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it
> > > should be, but for 64-bit platforms na.typecode() == 'N' that is not a
> > > valid type in Numeric. I guess that na.typecode() should be mapped to
> > > 'l' in 64-bit platforms so that Numeric can recognize the Int64
> > > correctly.
> >
> > I agree that since the typecode() method exists for backward
> > compatibility,  returning 'N' rather than 'l' on an LP64 platform can be
> > considered a bug.   However,  there are two problems I see:
> >
> > 1. Returning 'l' doesn't handle the case of converting a numarray Int64
> > array on a 32-bit platform.   AFIK, there is no typecode that will work
> > for that case.  So,  we're only getting a partial solution.
> 
> One can always do a separate case for 64-bit platforms. This solution
> is already used in Lib/numerictypes.py

True.  I'm just pointing out that doing this is still "half broken".  On
the other hand,  it is also "half fixed".


>  if numinclude.hasUInt64:
>      _MaximumType = {
> ---------------------------------------------------------------
> 
> With that, we have on 64-bit platforms:
> 
> >>> import Numeric
> >>> Num=Numeric.array((3,),typecode='l')
> >>> import numarray
> >>> na=numarray.array(Num,typecode=Num.typecode())
> >>> Numeric.array(na,typecode=na.typecode())
> array([3])
> >>> Numeric.array(na,typecode=na.typecode()).typecode()
> 'l'
> 
> and on 32-bit:
> 
> >>> Num=Numeric.array((3,),typecode='l')
> >>> na=numarray.array(Num,typecode=Num.typecode())
> >>> Numeric.array(na,typecode=na.typecode())
> array([3],'i')
> >>> Numeric.array(na,typecode=na.typecode()).typecode()
> 'i'
> 
> Which should be the correct behaviour.

My point was that if you have a numarray Int64 array,  there's nothing
in 32-bit Numeric to convert it to.  Round tripping from
Numeric-to-numarray works,  but not from numarray-to-Numeric.  In this
case,  I think "half-fixed" still has some merit,  I just wanted it to
be clear what we're not doing.

> > I think we may be butting up against the absolute/relative type
> > definition problem.  Comments?
>
> That may add some confusion, but if we want to be consistent with the
> 'l' (long int) meaning for different platforms, I think the suggested
> patch (or other more elegant) is the way to go, IMHO.

I logged this on Source Forge and will get something in for numarray-1.4
so that the typecode() method gives a workable answer on LP64. 
Intersted parties should stick to using the typecode() method rather
than any of numarray's typecode related mappings.

Cheers,
Todd


From simon at arrowtheory.com  Thu Apr 28 17:38:08 2005
From: simon at arrowtheory.com (Simon Burton)
Date: Thu Apr 28 17:38:08 2005
Subject: [Numpy-discussion] numarray dotblas problem on OSX
Message-ID: <20050429103116.092907a7.simon@arrowtheory.com>

Hi,

I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from fink)
who has managed to bomb on this little code example:

>>> import numarray as na
>>> import numarray.random_array as ra
>>> a = ra.random(shape=(257,256))
>>> b = ra.random(shape=(1,256))
>>> na.innerproduct(a, b)

He gets a blas error:

ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine cblas_dgemm was incorrect
Mac OS BLAS parameter error in cblas_dgemm, parameter #0, (unavailable), is 0


Simon.

-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com 


From rkern at ucsd.edu  Thu Apr 28 18:05:30 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Thu Apr 28 18:05:30 2005
Subject: [Numpy-discussion] numarray dotblas problem on OSX
In-Reply-To: <20050429103116.092907a7.simon@arrowtheory.com>
References: <20050429103116.092907a7.simon@arrowtheory.com>
Message-ID: <42718719.1010206@ucsd.edu>

Simon Burton wrote:
> Hi,
> 
> I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from fink)
> who has managed to bomb on this little code example:
> 
> 
>>>>import numarray as na
>>>>import numarray.random_array as ra
>>>>a = ra.random(shape=(257,256))
>>>>b = ra.random(shape=(1,256))
>>>>na.innerproduct(a, b)
> 
> 
> He gets a blas error:
> 
> ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine cblas_dgemm was incorrect
> Mac OS BLAS parameter error in cblas_dgemm, parameter #0, (unavailable), is 0

On OS X 10.3, numarray 1.3.0, self-compiled for the Apple-installed 
Python with vecLib as the BLAS, I don't get an error.

I don't get a result that's sensible to me, either; I get a 
(257,1)-shape array with only the first and last entries non-zero. Your 
colleague might want to reconsider whether he wants innerproduct() or 
dot(), with the appropriate change of shape for b.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From rkern at ucsd.edu  Thu Apr 28 18:09:53 2005
From: rkern at ucsd.edu (Robert Kern)
Date: Thu Apr 28 18:09:53 2005
Subject: [Numpy-discussion] numarray dotblas problem on OSX
In-Reply-To: <42718719.1010206@ucsd.edu>
References: <20050429103116.092907a7.simon@arrowtheory.com> <42718719.1010206@ucsd.edu>
Message-ID: <427188D1.201@ucsd.edu>

Robert Kern wrote:
> Simon Burton wrote:
> 
>> Hi,
>>
>> I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from 
>> fink)
>> who has managed to bomb on this little code example:
>>
>>
>>>>> import numarray as na
>>>>> import numarray.random_array as ra
>>>>> a = ra.random(shape=(257,256))
>>>>> b = ra.random(shape=(1,256))
>>>>> na.innerproduct(a, b)
>>
>>
>>
>> He gets a blas error:
>>
>> ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine 
>> cblas_dgemm was incorrect
>> Mac OS BLAS parameter error in cblas_dgemm, parameter #0, 
>> (unavailable), is 0
> 
> 
> On OS X 10.3, numarray 1.3.0, self-compiled for the Apple-installed 
> Python with vecLib as the BLAS, I don't get an error.
> 
> I don't get a result that's sensible to me, either; I get a 
> (257,1)-shape array with only the first and last entries non-zero.

Oh yes, and apparently a segfault on exit, too.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


From edcjones at comcast.net  Fri Apr 29 11:26:05 2005
From: edcjones at comcast.net (Edward C. Jones)
Date: Fri Apr 29 11:26:05 2005
Subject: [Numpy-discussion] numarray: problem with numarray.records
Message-ID: <42727B35.9050401@comcast.net>

#! /usr/bin/env python

import numarray, numarray.strings, numarray.records

doubles = numarray.array([1.0], 'Float64')
strings = numarray.strings.array('abcdefgh', itemsize=8,
                kind=numarray.strings.RawCharArray)
print numarray.records.array(buffer=[strings, strings])
print
print numarray.records.array(buffer=[doubles, doubles])
print
print numarray.records.array(buffer=[strings, doubles])
"""
The output is:

RecArray[
('abcdefgh'),
('abcdefgh')
]

RecArray[
(1.0, 1.0)
]

Traceback (most recent call last):
   File "./mess.py", line 12, in ?
     print numarray.records.array(buffer=[strings, doubles])
   File "/usr/local/lib/python2.4/site-packages/numarray/records.py", 
line 397, in array
     byteorder=byteorder, aligned=aligned)
   File "/usr/local/lib/python2.4/site-packages/numarray/records.py", 
line 106, in fromrecords
     raise ValueError, "inconsistent data at row %d,field %d" % (row, col)
ValueError: inconsistent data at row 1,field 0

The numarray docs (11.2) say:
The first argument, buffer, may be any one of the following:
...
(5) a list of numarrays. There must be one such numarray for each field.

What is going on here?
"""


From edcjones at comcast.net  Fri Apr 29 11:32:07 2005
From: edcjones at comcast.net (Edward C. Jones)
Date: Fri Apr 29 11:32:07 2005
Subject: [Numpy-discussion] numarray: lexicographical sort
Message-ID: <42727D37.8070700@comcast.net>

Suppose arr is a two dimensional numarray. Can the following be done 
entirely within numarray?

alist = arr.tolist()
alist.sort()
arr = numarray.array(alist, arr.type())


From jmiller at stsci.edu  Fri Apr 29 12:42:22 2005
From: jmiller at stsci.edu (Todd Miller)
Date: Fri Apr 29 12:42:22 2005
Subject: [Numpy-discussion] numarray: lexicographical sort
In-Reply-To: <42727D37.8070700@comcast.net>
References: <42727D37.8070700@comcast.net>
Message-ID: <1114803546.21036.30.camel@halloween.stsci.edu>

On Fri, 2005-04-29 at 14:30, Edward C. Jones wrote:
> Suppose arr is a two dimensional numarray. Can the following be done 
> entirely within numarray?
>
> alist = arr.tolist()
> alist.sort()
> arr = numarray.array(alist, arr.type())
> 

I'm pretty sure the answer is no.  The comparisons in numarray's sort()
functions are all single element numerical comparisons.  The list sort()
is using a polymorphic comparison which in this case is the comparison
of two lists.  There's nothing like that in numarray so I don't think
it's possible.

Todd