Mailman 3 array interface nitpicks - NumPy-Discussion

newer
Re: [Numpy-discussion] Questions...

array interface nitpicks

older
Re: [Numpy-discussion] Possible...

cookedm＠physics.mcmaster.ca

6 Apr 2005 6 Apr '05

9:01 a.m.

Just some small nitpicks in the array interface document (http://numeric.scipy.org/array_interface.html): As written: """ __array_shape__ (required) Tuple showing size in each dimension. Each entry in the tuple must be a Python (long) integer. Note that these integers could be larger than the platform "int" or "long" could hold. Use Py_LONG_LONG if accessing the entries of this tuple in C. """ Since this is supposed to be an interface, not an implementation (duck-typing and all that), I think this is too strict: __array_shape__ should just be a sequence of integers, not necessarily a tuple. I'd suggest something like this: ''' __array_shape__ (required) Sequence whose elements are the size in each dimension. Each entry is an integer (a Python int or long). Note that these integers could be larger than the platform "int" or "long" could hold (a Python int is a C long). It is up to the calling code to handle this appropiately; either by raising an error when overflow is possible, or by using Py_LONG_LONG as the C type for the shapes. ''' This is clearer about the users responsibility -- note that Numeric is taking the first approach (error), as the dimensions in PyArrayObject are ints. Similiar comments about __array_strides. I'd reword it along the lines of ''' __array_strides__ (optional) Sequence of strides which provides the number of bytes needed to jump to the next array element in the corresponding dimension. Each entry must be integer (a Python int or long). As with __array_shape__, the values may be larger than can be represented by a C "int" or "long"; the calling code should handle this appropiately, either by raising an error, or by using Py_LONG_LONG in C. Default is a strides tuple which implies a C-style contiguous memory buffer. In this model, the last dimension of the array varies the fastest. For example, the default __array_strides__ tuple for an object whose array entries are 8 bytes long and whose __array_shape__ is (10,20,30) would be (4800, 240, 8) Default: C-style contiguous ''' I'm mostly worried about the use of Python longs; it shouldn't be necessary in almost all cases, and adds extra complications (in normal usage, you don't see Python longs all that much). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca

Show replies by date

Chris Barker

6 Apr 6 Apr

9:21 p.m.

New subject: Questions about the array interface.

Hi all, (but mostly Travis), I've taken a look at: http://numeric.scipy.org/array_interface.html) to try and see how I would use this with wxPython. I have a few questions, and a little code I'd like you to look at to see if I understand how this works. Here's a first stab on how I might use this for the wxPython DrawPointsList method. The method takes a sequence of length-2 sequences of numbers, and draws a point at each point described by coordinates in the data: [(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints) Here's what I have: def DrawPointList(self, points, pens=None): ... # some checking code on the pens) ... if (hasattr(points,'__array_shape__') and hasattr(points,'__array_typestr__') and len(points.__array_shape__) == 2 and points.__array_shape__[1] == 2 and points.__array_typestr__ == 'i4' and ): # this means we have a compliant array # return the array protocol version return self._DrawPointArray(points.__array_data__, pens,[]) #This needs to be written now! else: #return the generic python sequence version return self._DrawPointList(points, pens, []) Then we'll need a function (in C++): _DrawPointArray(points.__array_data__, pens,[]) That takes a buffer object, and does the drawing. My questions: 1) Is this what you had in mind for how to use this? 2) As __array_strides__ is optional, I'd kind of like to have a __contiguous__ flag that I could just check, rather than checking for the existence of strides, then calculating what the strides should be, then checking them. 3) A number of the attributes are optional, but will always be there with SciPy arrays..(I assume) have you documented them anywhere? 4) a wxWidgets wxPoint is defined as such: class WXDLLEXPORT wxPoint { public: int x, y; etc. As wxWidgets is using "int", I"d like to be able to use "int". If I define it as a 4 byte integer, I'm losing platform independence, aren't I? Or can I use something like sizeof(int) ? 5) Why is: __array_data__ optional? Isn't that the whole point of this? 6) Should __array_offset__ be optional? I'd rather it were required, but default to zero. This way I have to check for it, then use it. Also, I assume it is an integer number of bytes, is that right? 7) An alternative to the above: A __simple_ flag, that means the data is a simple, C array of contiguous data of a single type. The most common use, and it would be nice to just check that flag and not have to take all other options into account. Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Travis Oliphant

10:29 p.m.

New subject: Questions about the array interface.

Chris Barker wrote:

...

Hi all, (but mostly Travis),

I've taken a look at:

http://numeric.scipy.org/array_interface.html)

to try and see how I would use this with wxPython. I have a few questions, and a little code I'd like you to look at to see if I understand how this works.

Great, fantastic!!!

...

Here's a first stab on how I might use this for the wxPython DrawPointsList method. The method takes a sequence of length-2 sequences of numbers, and draws a point at each point described by coordinates in the data:

[(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints)

Here's what I have:

def DrawPointList(self, points, pens=None): ... # some checking code on the pens) ... if (hasattr(points,'__array_shape__') and hasattr(points,'__array_typestr__') and len(points.__array_shape__) == 2 and points.__array_shape__[1] == 2 and points.__array_typestr__ == 'i4' and ): # this means we have a compliant array # return the array protocol version

You should account for the '<' or '>' that might be present in __array_typestr__ (Numeric won't put it there, but scipy.base and numarray will---since they can have byteswapped arrays internally). A more generic interface would handle multiple integer types if possible (but this is a good start...)

...

return self._DrawPointArray(points.__array_data__, pens,[]) #This needs to be written now! else: #return the generic python sequence version return self._DrawPointList(points, pens, [])

Then we'll need a function (in C++): _DrawPointArray(points.__array_data__, pens,[]) That takes a buffer object, and does the drawing.

My questions:

1) Is this what you had in mind for how to use this?

Yes, pretty much.

...

2) As __array_strides__ is optional, I'd kind of like to have a __contiguous__ flag that I could just check, rather than checking for the existence of strides, then calculating what the strides should be, then checking them.

I don't want to add too much. The other approach is to establish a set of helper functions in Python to check this sort of thing: Thus, if you can't handle a general array you check: ndarray.iscontiguous(obj) where obj exports the array interface. But, it could really go either way. What do others think? I think one idea here is that if __array_strides__ returns None, then C-style contiguousness is assumed. In fact, I like that idea so much that I just changed the interface. Thanks for the suggestion.

...

3) A number of the attributes are optional, but will always be there with SciPy arrays..(I assume) have you documented them anywhere?

No, they won't always be there for SciPy arrays (currently 4 of them are). Only record-arrays will provide __array_descr__ for example and __array_offset__ is unnecessary for SciPy arrays. I actually don't much like the __array_offset__ parameter myself, but Scott convinced me that it would could be useful for very complicated array classes.

...

4) a wxWidgets wxPoint is defined as such:

class WXDLLEXPORT wxPoint { public: int x, y;

etc.

As wxWidgets is using "int", I"d like to be able to use "int". If I define it as a 4 byte integer, I'm losing platform independence, aren't I? Or can I use something like sizeof(int) ?

Ah, yes.. here is where we need some standard Python functions to help establish the array interface. Sometimes you want to match a particular c-type, other times you want to match a particular bit width. So, what do you do? I had considered having an additional interface called ctypestr but decided against it for fear of creep. I think in general we need to have in Python some constants to make this conversion easy e.g. ndarray.cint (gives 'iX' on the correct platform). For now, I would check (__array_typestr__ == 'i%d' % array.array('i',[0]).itemsize) But, on most platforms these days an int is 4 bytes, but the about would be just to make sure.

...

5) Why is: __array_data__ optional? Isn't that the whole point of this?

Because the object itself might expose the buffer interface. We could make __array_data__ required and prefer that it return a buffer object. But, really all that is needed is something that exposes the buffer interface: remember the difference between the buffer object and the buffer interface. So, the correct consumer usage for grabbing the data is data = getattr(obj, '__array_data__', obj) Then, in C you use the Buffer *Protocol* to get a pointer to memory. For example, the function: int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int *buffer_len) Of course this approach has the 32-bit limit until we get this changed in Python.

...

6) Should __array_offset__ be optional? I'd rather it were required, but default to zero. This way I have to check for it, then use it. Also, I assume it is an integer number of bytes, is that right?

A consumer has to check for most of the optional stuff if they want to support all types of arrays. Again a simple: getattr(obj, '__array_offset__', 0) works fine.

...

7) An alternative to the above: A __simple_ flag, that means the data is a simple, C array of contiguous data of a single type. The most common use, and it would be nice to just check that flag and not have to take all other options into account.

I think if __array_strides__ returns None (and if an object doesn't expose it you can assume it) it is probably good enough. -Travis

Chris Barker

7 Apr 7 Apr

5:06 a.m.

New subject: Questions about the array interface.

Travis Oliphant wrote:

...

You should account for the '<' or '>' that might be present in __array_typestr__ (Numeric won't put it there, but scipy.base and numarray will---since they can have byteswapped arrays internally).

Good point, but a pain. Maybe they should be required, that way I don't have to first check for the presence of '<' or '>', then check if they have the right value.

...

A more generic interface would handle multiple integer types if possible

I'd like to support doubles as well...

...

(but this is a good start...)

Right. I want to get _something_ working, before I try to make it universal!

...

I think one idea here is that if __array_strides__ returns None, then C-style contiguousness is assumed. In fact, I like that idea so much that I just changed the interface. Thanks for the suggestion.

You're welcome. I like that too.

...

No, they won't always be there for SciPy arrays (currently 4 of them are). Only record-arrays will provide __array_descr__ for example and __array_offset__ is unnecessary for SciPy arrays. I actually don't much like the __array_offset__ parameter myself, but Scott convinced me that it would could be useful for very complicated array classes.

I can see that it would, but then, we're stuck with checking for all these optional attributes. If I don't bother to check for it, one day, someone is going to pass a weird array in with an offset, and a strange bug will show up.

...

e.g. ndarray.cint (gives 'iX' on the correct platform). For now, I would check (__array_typestr__ == 'i%d' % array.array('i',[0]).itemsize)

I can see that that would work, but it does feel like a hack. BEsides, I might be doign this in C++ anyway, so it would probably be easier to use sizeof()

...

But, on most platforms these days an int is 4 bytes, but the about would be just to make sure.

Right. Making that assumption will jsut lead to weird bugs way don't he line. Of course, I wouldn't be surprised if wxWidgets and/or python makes that assumption in other places anyway!

...

...
5) Why is: __array_data__ optional? Isn't that the whole point of this?

Because the object itself might expose the buffer interface. We could make __array_data__ required and prefer that it return a buffer object.

Couldn't it be required, and return a reference to itself if that works? Maybe I'm just being lazy, but it feels clunky and prone to errors to keep having to check if a attribute exists, then use it (or not).

...

So, the correct consumer usage for grabbing the data is

data = getattr(obj, '__array_data__', obj)

Ah! I hadn't noticed the default parameter to getattr(). That makes it much easier. Is there an equivalent in C? It doesn't look like it to me, but I'm kind of a newbie with the C API.

...

int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int *buffer_len)

I'm starting to get this.

...

Of course this approach has the 32-bit limit until we get this changed in Python.

That's the least of my worries!

...

...
6) Should __array_offset__ be optional? I'd rather it were required, but default to zero. This way I have to check for it, then use it. Also, I assume it is an integer number of bytes, is that right?

A consumer has to check for most of the optional stuff if they want to support all types of arrays.

That's not quite true. I'm happy to support only the simple types of arrays (contiguous, single type elements, zero offset(, but I have to check all that stuff to make sure that I have a simple array. The simplest arrays are the most common case, they should be as easy as possible to support.

...

Again a simple:

getattr(obj, '__array_offset__', 0)

works fine.

not too bad. Also, what if we find the need for another optional attribute later? Any older code won't check for it. Or maybe I'm being paranoid....

...

...
7) An alternative to the above: A __simple_ flag, that means the data is a simple, C array of contiguous data of a single type. The most common use, and it would be nice to just check that flag and not have to take all other options into account.

...

I think if __array_strides__ returns None (and if an object doesn't expose it you can assume it) it is probably good enough.

That and __array_typestr__ Travis Oliphant wrote:

...

At http://numeric.scipy.org/array_interface.py

you will find the start of a set of helper functions for the array interface that can make it more easy to deal with.

Ah! this may well address my concerns. Good idea. Thanks for all your work on this Travis. By the way, a quote form Robin Dunn about this: "Sweet!" Thought you might appreciate that. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

cookedm＠physics.mcmaster.ca

6:25 a.m.

New subject: Questions about the array interface.

"Chris Barker" writes:

...

Travis Oliphant wrote:

...
You should account for the '<' or '>' that might be present in __array_typestr__ (Numeric won't put it there, but scipy.base and numarray will---since they can have byteswapped arrays internally).

Good point, but a pain. Maybe they should be required, that way I don't have to first check for the presence of '<' or '>', then check if they have the right value.

I'll second this. Pulling out more Python Zen: Explicit is better than implicit.

...

...
A more generic interface would handle multiple integer types if possible

I'd like to support doubles as well...

...
(but this is a good start...)

Right. I want to get _something_ working, before I try to make it universal!

...
I think one idea here is that if __array_strides__ returns None, then C-style contiguousness is assumed. In fact, I like that idea so much that I just changed the interface. Thanks for the suggestion.

You're welcome. I like that too.

...
No, they won't always be there for SciPy arrays (currently 4 of them are). Only record-arrays will provide __array_descr__ for example and __array_offset__ is unnecessary for SciPy arrays. I actually don't much like the __array_offset__ parameter myself, but Scott convinced me that it would could be useful for very complicated array classes.

I can see that it would, but then, we're stuck with checking for all these optional attributes. If I don't bother to check for it, one day, someone is going to pass a weird array in with an offset, and a strange bug will show up.

Here's a summary: Attributes required by required array-like object to be checked __array_shape__ yes yes __array_typestr__ yes yes __array_descr__ no no __array_data__ no yes __array_strides__ no yes __array_mask__ no no? __array_offset__ no yes I'm assuming in "required to be checked" column a user of the array that's interested in looking at all of the elements, so we have to consider all possible situations where forgetting to consider an attribute could lead to invalid memory accesses. __array_strides__ and __array_offset__ in particular could be troublesome if forgotten. The __array_mask__ element is difficult: for most applications, you should check it, and raise an error if exists and is not None, unless you can handle missing elements. It's certainly not required that all users of an array object need to understand all array types! Since we have to check a bunch anyways, I think that's a good enough reason for having them to exist? There are suitable defaults defined in the protocol document (__array_strides__ in particular) that make it easy to add them in simple cases.

...

...
So, the correct consumer usage for grabbing the data is data = getattr(obj, '__array_data__', obj)

Ah! I hadn't noticed the default parameter to getattr(). That makes it much easier. Is there an equivalent in C? It doesn't look like it to me, but I'm kind of a newbie with the C API.

You'd want something like adata = PyObject_GetAttrString(array_obj, "__attr_data__"); if (!adata) { /* error */ PyErr_Clear(); adata = array_obj; }

...

...
int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int *buffer_len)

I'm starting to get this.

...
Of course this approach has the 32-bit limit until we get this changed in Python.

That's the least of my worries!

...
...
6) Should __array_offset__ be optional? I'd rather it were required, but default to zero. This way I have to check for it, then use it. Also, I assume it is an integer number of bytes, is that right? A consumer has to check for most of the optional stuff if they want to support all types of arrays.

That's not quite true. I'm happy to support only the simple types of arrays (contiguous, single type elements, zero offset(, but I have to check all that stuff to make sure that I have a simple array. The simplest arrays are the most common case, they should be as easy as possible to support.

...
Again a simple: getattr(obj, '__array_offset__', 0) works fine.

not too bad.

Also, what if we find the need for another optional attribute later? Any older code won't check for it. Or maybe I'm being paranoid....

This is a good point; all good protocols embed a version somewhere. Not doing it now could lead to grief/pain later. I'd suggest adding to __array_data__: If __array_data__ is None, then the array is implementing a newer version of the interface, and you'd either need to support that (maybe the new version uses __array_data2__ or something), or use the sequence protocol on the original object. The sequence protocol should definitely be safe all the time, whereas the buffer protocol may not. (Put it this way: I understand the sequence protocol well, but not the buffer one :-) That would also be a good argument for it existing, I think. Alternatively, we could add an __array_version__ attribute (required to exist, required to check) which is set to 1 for this protocol. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca

Scott Gilbert

10:22 a.m.

New subject: Questions about the array interface.

--- "David M. Cooke" wrote:

...

...
Good point, but a pain. Maybe they should be required, that way I don't have to first check for the presence of '<' or '>', then check if they have the right value.

I'll second this. Pulling out more Python Zen: Explicit is better than implicit.

I'll third.

...

This is a good point; all good protocols embed a version somewhere. Not doing it now could lead to grief/pain later.

I'd suggest adding to __array_data__: If __array_data__ is None, then the array is implementing a newer version of the interface, and you'd either need to support that (maybe the new version uses __array_data2__ or something), or use the sequence protocol on the original object. The sequence protocol should definitely be safe all the time, whereas the buffer protocol may not. (Put it this way: I understand the sequence protocol well, but not the buffer one :-)

That would also be a good argument for it existing, I think.

Alternatively, we could add an __array_version__ attribute (required to exist, required to check) which is set to 1 for this protocol.

I like this, although I think having __array_data__ return None is confusing. I think __array_version__ (or __array_protocol__?) is the better choice. How about have it optional and default to 1? If it's present and greater than 1 then it means there is something new going on... Cheers, -Scott

Travis Oliphant

1:53 p.m.

New subject: Questions about the array interface.

Scott Gilbert wrote:

...

--- "David M. Cooke" wrote:

...
...
Good point, but a pain. Maybe they should be required, that way I don't have to first check for the presence of '<' or '>', then check if they have the right value.

I'll second this. Pulling out more Python Zen: Explicit is better than implicit.

I'll third.

O.K. It's done....

Andrew Straw

9:26 p.m.

New subject: Questions about the array interface.

Travis Oliphant wrote:

...

Scott Gilbert wrote:

...
--- "David M. Cooke" wrote:

...
...
Good point, but a pain. Maybe they should be required, that way I don't have to first check for the presence of '<' or '>', then check if they have the right value.

I'll second this. Pulling out more Python Zen: Explicit is better than implicit.

I'll third.

O.K. It's done....

Here's a bit of weirdness which has prevented me from using '<' or '>' in the past with the struct module. I'm not guru enough to know what's going on, but it has prevented me from being explicit rather than implicit. In [1]:import struct In [2]:from numarray.ieeespecial import nan In [3]:nan Out[3]:nan In [4]:struct.pack('

Scott Gilbert

10:13 a.m.

New subject: Questions about the array interface.

--- Chris Barker wrote:

...

I can see that it would, but then, we're stuck with checking for all these optional attributes. If I don't bother to check for it, one day, someone is going to pass a weird array in with an offset, and a strange bug will show up.

Everyone seems to think that an offset is so weird. I haven't looked at the internals of Numeric/scipy.base in a while so maybe it doesn't apply there. However, if you subscript an array and return a view to the data, you need an offset or you need to create a new buffer that encodes the offset for you. A = reshape(arange(9), (3,3)) 0, 1, 2 3, 4, 5 6, 7, 8 B = A[2] # create a view into A 6, 7, 8 # Shared with the data above Unless you're going to create a new buffer (which I guess is what Numeric is doing), the offset for B would be 6 in this very simple case. I think specifying the offset is much more elegant than creating a new buffer object with a hidden offset that refers to the old buffer object. I guess all I'm saying is that I wouldn't assume the offset is zero...

...

Couldn't it be required, and return a reference to itself if that works?

Maybe I'm just being lazy, but it feels clunky and prone to errors to keep having to check if a attribute exists, then use it (or not).

The problem is that you aren't being lazy enough. :-) The fact that a lot of these attributes are optional should be hidden in helper functions like those in Travis's array_interface.py module, or a C/C++ include file (with inline functions). In a short while, you shouldn't have to check any __array_metadata__ attributes directly. There should even be a helper function for getting the array elements. It wouldn't be a horrible mistake to have all the attributes be mandatory, but it doesn't get array consumes any benefit that they can't get from a well written helper library, and it does add some burden to array producers. Cheers, -Scott

Chris Barker

5:08 p.m.

New subject: Questions about the array interface.

Scott Gilbert wrote:

...

I think __array_version__ (or __array_protocol__?) is the better choice. How about have it optional and default to 1? If it's present and greater than 1 then it means there is something new going on...

Again, I'm uncomfortable with something that I have to check being optional. If it is, we're encouraging people to not check it, and that' a recipe for bugs later on down the road.

...

Everyone seems to think that an offset is so weird. I haven't looked at the internals of Numeric/scipy.base in a while so maybe it doesn't apply there. However, if you subscript an array and return a view to the data, you need an offset or you need to create a new buffer that encodes the offset for you.

...

I guess all I'm saying is that I wouldn't assume the offset is zero...

Good point. All the more reason to have the offset be mandatory.

...

The fact that a lot of these attributes are optional should be hidden in helper functions like those in Travis's array_interface.py module, or a C/C++ include file (with inline functions).

Yes, if there is a C/C++ version of all these helper functions, I'll be a lot happier. And you're right, the same information should not be encoded in two places, so my "iscontiguous" attribute should be a helper function or maybe a method.

...

In a short while, you shouldn't have to check any __array_metadata__ attributes directly. There should even be a helper function for getting the array elements.

Cool. How would that work? A C++ iterator? I"m thinking not, as this is all C, no?

...

It wouldn't be a horrible mistake to have all the attributes be mandatory, but it doesn't get array consumes any benefit that they can't get from a well written helper library, and it does add some burden to array producers.

Hardly any. I'm assuming that there will be a base_array class that can be used as a base class or mixin, so it wouldn't be any work at all to have a full set of attributes with defaults. It would take up a little bit of memory. I'm assuming that the whole point of this is to support large datasets, but maybe that isn't a valid assumption, After all, small array support has turned out to be very important for Numeric. As a rule of thumb, I think there will be consumers of arrays that producers, so I'd rather make it easy on the consumers that the producers, if we need to make such a trade off. Maybe I'm biased, because I'm a consumer. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Scott Gilbert

10:05 a.m.

New subject: Questions about the array interface.

--- Travis Oliphant wrote:

...

...
2) As __array_strides__ is optional, I'd kind of like to have a __contiguous__ flag that I could just check, rather than checking for the existence of strides, then calculating what the strides should be, then checking them.

I don't want to add too much. The other approach is to establish a set of helper functions in Python to check this sort of thing: Thus, if you can't handle a general array you check:

ndarray.iscontiguous(obj)

where obj exports the array interface.

But, it could really go either way. What do others think?

I think this should definitely be done in the helper functions. Having extra attributes encode redundant information is a recipe for trouble. Cheers, -Scott

James Carroll

10:14 a.m.

New subject: Questions about the array interface.

Hi Chris, Travis, ... Great conversation you've started. I have two questions at the moment... I do love the idea that an abstraction can bring the different but similar num* worlds together. Which sourceforge CVS repository is the interface (and an implementation) show up on first? My guess is numpy/numeric3 I see Travis has been updating it while I sleep.

...

def DrawPointList(self, points, pens=None): ... # some checking code on the pens) ... if (hasattr(points,'__array_shape__') and hasattr(points,'__array_typestr__') and len(points.__array_shape__) == 2 and points.__array_shape__[1] == 2 and points.__array_typestr__ == 'i4' and ): # this means we have a compliant array # return the array protocol version return self._DrawPointArray(points.__array_data__, pens,[]) #This needs to be written now!

This means that whenever you have some complex multivalued multidementional structure with the data you want to plot, you have to reshape it into the above 'compliant' array before passing it on. I'm a newbie, but is this reshape something where the data has to be copied and take up memory twice? If not, then great, you would painlessly reshape into something that had a different set of strides that just accessed the data that complied in the big blob of data. If the reshape is expensive, then maybe we need the array abstraction, and then a second 'thing' that described which parts of the array to use for the sequence of 2-tuples to use for plotting the x,y s of a scatter plot. (or whatever) I do think we can accept more than just i4 for a datatype. Especially since a last-minute cast to i4 in inexpensive for almost every data type.

...

else: #return the generic python sequence version return self._DrawPointList(points, pens, [])

Then we'll need a function (in C++): _DrawPointArray(points.__array_data__, pens,[])

Looks great. -Jim

Chris Barker

5:50 p.m.

New subject: Questions about the array interface.

James Carroll wrote:

...

...
def DrawPointList(self, points, pens=None): ... # some checking code on the pens) ... if (hasattr(points,'__array_shape__') and hasattr(points,'__array_typestr__') and len(points.__array_shape__) == 2 and points.__array_shape__[1] == 2 and points.__array_typestr__ == 'i4' and ): # this means we have a compliant array # return the array protocol version return self._DrawPointArray(points.__array_data__, pens,[]) #This needs to be written now!

This means that whenever you have some complex multivalued multidementional structure with the data you want to plot, you have to reshape it into the above 'compliant' array before passing it on. I'm a newbie, but is this reshape something where the data has to be copied and take up memory twice?

Probably. It depends on two things: 1) What structure the data is in at the moment 2) Whether we write the code to handle more "complex" arrangements of data: discontiguous arrays, for instance. But the idea is to require a data structure that makes sense for the data. For example, a natural way to store a whole set of coordinates is to use an NX2 NumPy array of doubles. This is exactly the data structure that I want the above function to accept. If the points are somehow a subset of a larger array, then they will be in a discontiguous array, and I'm not sure if I want to bother to try to handle that. You can always use the generic sequence interface to access the data, but that will be a lot slower. We're interfacing with a static language here, we can get optimum performance only by specifying a particular data structure.

...

If not, then great, you would painlessly reshape into something that had a different set of strides that just accessed the data that complied in the big blob of data. If the reshape is expensive, then maybe we need the array abstraction, and then a second 'thing' that described which parts of the array to use for the sequence of 2-tuples to use for plotting the x,y s of a scatter plot. (or whatever)

The proposed array interface does provide a certain level of abstraction, that's what: __array_shape__ __array_typestr__ __array_descr__ __array_strides__ __array_offset__ Are all about we could certainly write the wxPy_LIST_helper functions to handle a larger variety of options that the simple contiguous C array, but I want to start with the simple case, and I'm not sure directly handling the more complex cases is worth it. I'm imagining that the user will need to do something like: dc.DrawPointList(asarray(points, Int)) It's easier to use the utility functions that Numeric provides than re-write similar code in wxPython.

...

I do think we can accept more than just i4 for a datatype. Especially since a last-minute cast to i4 in inexpensive for almost every data type.

Sure, but we're interfacing with a static language, so for each data type supported, we need to cast the data pointer to the right type, then have a code to convert it to the type needed by wx. It's not a big deal, but I'd rather keep it simple. I do want to support at least doubles and ints. Users can use Numeric's astype() method to convert if need be. I've noticed that there is a wxRealPoint class that uses doubles, but it doesn't look like it can be used as input to any of the wxDC methods. Too bad. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

6956

Age (days ago)

6957

Last active (days ago)

List overview

Download

12 comments

6 participants

participants (6)

Andrew Straw
Chris Barker
cookedm＠physics.mcmaster.ca
James Carroll
Scott Gilbert
Travis Oliphant

array interface nitpicks

cookedm＠physics.mcmaster.ca

Chris Barker

Travis Oliphant

Chris Barker

cookedm＠physics.mcmaster.ca

Scott Gilbert

Travis Oliphant

Andrew Straw

Scott Gilbert

Chris Barker

Scott Gilbert

James Carroll

Chris Barker

tags

participants (6)