[Numpy-discussion] Introduction

Scott Gilbert xscottg at yahoo.com
Sun Apr 14 21:10:12 EDT 2002


--- Perry Greenfield <perry at stsci.edu> wrote:

*** Just skim through my first few responses.  About half way through
writing this letter, a few things hit me.  I still want to propose some
changes, but I don't think you'll find them as intrusive...



>
> >
> > Then anyone who implemented these could work with the same C API for
> > getting the pointer to memory, shape array, stride array, and item
> > size.
> >
> Then you are talking about standardizing a C-API. But I'm still
> confused. If you write a class that implements these attributes,
> is it your C-API that uses them, or do you mean our C-API uses
> them?
>

I'm not really talking about standardizing a C-API.  I'm talking about
standardizing what that C-API would have to do.  You would have your 
C-API as part of numarray proper.  And, for the short term, I would have
my own C-API as part of what I need to get done.

Both C-API's would use the same attributes.

Why do I want my own C-API today?  Because numarray isn't done yet, and
I can't create arrays of the types I need.  I'll need a C-API to get at
my types.  It would be great if the same C-API could get at yours too.




>
> If you have your own C-API, then the attributes are not
> relevant as an interface. If you intend to use our C-API to access
> your objects, then they are. 
>

Either C-API could access anything that looks like an NDArray.



>
> >
> > Because truthfully arrays are little more than a pointer to memory.
> >
> > That's like asking "why in the world would we presume memcpy() or
> > qsort() would know what to do with your memory?"
> >
>
> Then you misunderstand Numarray. Numarrays are far more than just
> a pointer to memory. You can get a pointer to memory from them,
> but they entail much more than that. Numarray presumes that certain
> things are possible with NumArray objects (like standard math
> operations). If you want something that doesn't make such an
> assumption, you should be using NDArray instead. NDArray makes
> no presumptions about the contents of the memory other than
> they are arranged in memory in array fashion.
>

I think I understand where you're coming from now.  

(BTW, I think some of our confusion comes from when I'm talking about
"Numarray" or "numarray" the package versus "NumArray" and 
"NDArray" the classes.)


*** Ok, I think there is light at the end of this tunnel...

I guess what I've been arguing for all along is something a lot like
an NDArray where I can specify the typecode (and possibly other things like
'endian' etc...), and that only NDArrays have a minimal set of standardized
attributes.


With this I can create extensions that will work with anything that
looks like an NDArray.  Your NDArrays from the numarray package, and
my NDArrays of crazy types.


I'm still left in the position of having to upcast an NDArray to a
full blown NumArray if I ever want to use my NDArrays in a routine
meant solely for NumArrays.  However this conversion isn't difficult,
and I think can do that when needed.



Important Question:  If an NDArray had a typecode (and it was a known
string), is it possible to promote it to one of the standard NumArray
types?

Lesser Question:  If an NDArray had a known typecode, is it desirable
for numarray routines to promote the NDArray to a NumArray in the same
way that the routines promote a Python list or tuple to a NumArray on
the fly?





Ok, my new proposal (again, treat it like a suggestion):

- Do you think it would be possible to standardize the set of attributes
that it requires to be an NDArray?  NDArrays are simple and unlikely to
change.  I think _those_ really are just pointers to memory with array
accounting information.  We could agree on what exactly constitutes an
NDArray.

- Could this standard set of attributes optionally include the names for
the typecode, endian, (and maybe some other) attributes?


That doesn't mean that your NDArrays would have to have the typecode,
endian or whatever information.  It just means that when any class does
add a typecode, it adds it as a specially named attribute.


I realize that a large part of what I want is interoperability between
separate implementations of NDArrays.


Anything that has (_data, _shape, _itemsize, _type) is something I could
work with in an extension.  Some other fields are optional (_strides,
_byteoffset) because they have sensible defaults that can be calculated
from above in the common case.

So the only difference between what you currently have and most of what
I'm proposing is that the names of NDArray attributes become standardized.


>
> If you are presenting numarray with a type it already knows about,
> why aren't you subclassing it?
>

Since I know I'll have to create types that numarray doesn't know
about, I know I'm going to have to write a new array class (it's
already written).

It would be silly of my new array class to not implement the standard
types just because numarray _does_ know about them.

I now realize that I don't have to give my class to numarray directly. 
That didn't hit me before.  I could promote/upcast it when necessary.
The upcast-in and downcast-out thing will add up to extra work and
messier code, but it is a workaround.


>
> If you present numarray an object
> with a type it doesn't know about, then that is pointless.
> Types and numarray are inextricably intertwined, and shall
> remain so.
>

Understood.  I don't want to ruin your NumArrays.


> 
> **********************************************************
> 
> What I want to see is a specific example. I'm not going to
> pay much attention to generalities because I'm still unclear
> about how you intend to do what you say you will do. Perhaps
> I'm slow, but I still don't get it.
> 

Nope, clearly it was me that was being slow.  

There is still that bit about NDArrays that I'm trying to justify, so my
example is below.


>
> (or alternatively,
> create a numarray object that uses the same buffer yours does).
>

You're right.  This hadn't occurred to me until just a little bit ago.


>
> E.g., "I want
> complex ints and I will develop a class that will use this to
> do the following things [it doesn't have to be exhaustive or
> complete, but include just enough to illustrate the point].
> If the attributes were standardized then I would do this and that,
> and use it with your stuff like this showing you the code
> (and the behavior I expect)."
>

Here goes (somewhat hypothetical, but close to the boat I'm currently in):

Jon is our FPGA guy who makes screaming fast core files, but our FPGAs
don't do floating point.  So I have to provide his driver with ComplexInt16
data.

Jon and I write an extension module that calls his driver and reads data. 
We also write a C routine (call it "munge") that takes both ComplexInt16
data, and ComplexFloat64 data.  We try it out for testing, and pass in my
arrays in both places.  We could have used Numarray for the ComplexFloat64,
but that meant we had to use two array packages, and use two C-APIs in our
extension.  All we needed was a pointer to an array of doubles, so we stuck
with mine.

Ok, that part of development is done.  Now we present it to the application
developers.  Their happy and we're rolling.  Successful application.

Another group find out about this and they want to use it.  They're using
numarray for a large part of their application.  In fact, their calculating
the ComplexFloat64 half the data that they want to pass to my "munge"
routine using numarray, and they still need to use my ComplexInt32 data to
read the FPGA.

They're going to be disappointed to find out my extension can't read
numarray data, and that they have to convert back and forth between the
two.  And as the list of routines grow, they have to keep track of whether
it is a numarray-routine, or a scottarray-routine.

It's not so bad for one simple "munge" function, but there are going to be
hundreds of functions...

I don't expect you to have much sympathy for my having to convert data back
and forth between my array types and yours, but it is an avoidable problem.



For the most part, we both agree on what parts an NDArray should have.  If
we could only agree what to name them, and that we'd stick to those names,
that would be a large part of it for me.




>
> Given this I can either show you an alternate solution or
> I can realize why you are right and we can discuss where
> to go from there. Otherwise you are wasting your time.
>


Cheers,
    -Scott





__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/




More information about the NumPy-Discussion mailing list