Mailman 3 Numarray header PEP - NumPy-Discussion

newer
plot dense and large arrays, AGG...

Numarray header PEP

Todd Miller

June 28, 2004

12:42 p.m.

Perry and I have written a draft PEP for the inclusion of some of the numarray headers within the Python distribution. The headers enable extension writers to provide better support for numarray without adding any additional dependencies (from a user's perspective) to their packages. For users who have numarray, an "improved extension" could provide better performance, and for those without numarray, the extension could work normally using the Sequence protocol. The PEP is attached. It is formatted using the docutils package which can be used to generate HTML or PDF. Comments and corrections would be appreciated. Regards, Todd

Attachments:

header_pep.txt (text/plain — 9.6 KB)

Show replies by date

Gerard Vermeulen

June 2004

10:46 a.m.

...

The PEP is attached. It is formatted using the docutils package which can be used to generate HTML or PDF. Comments and corrections would be appreciated.

PyQwt is a Python extension which can be conditionally compiled against Numeric and/or numarray (both, one of them or none). Your PEP excludes importing of Numeric and numarray in the same C-extension. All you need to do is to hide the macros PyArray_Present(), PyArray_isArray() and import_array() into a few functions with numarray specific names, so that the following becomes possible: #include <Numeric/meta.h> /* defines the functions (no macros) int Numeric_Present(); int Numeric_isArray(); void import_numeric(); to hide the Numeric C-API stuff in a small Numeric/meta.c file. */ #include <numarray/meta.h> /* defines the functions (no macros) int numarray_Present(); int numarray_isArray(); void import_numarray(); to hide the numarray C-API stuff in a small numarray/meta.c file. */ PyObject * some_array_returning_function(PyObject *m, PyObject *args) { int param; PyObject *result; if (!PyArg_ParseTuple(args, "i", ¶m)) return NULL; if (Numeric_Present()) { result = numeric_returning_function(param); } else if (Numarray_Present()) { result = numarray_returning_function(param); } else { result = list_returning_function(param); } return result; } PyObject * some_array_accepting_function(PyObject *m, PyObject *args) { PyObject *sequence, *result; if (!PyArg_ParseTuple(args, "O", &sequence)) return NULL; if (Numeric_isArray(sequence)) { result = numeric_input_function(sequence); } else if (Numarray_isArray(sequence)) { result = numarray_input_function(sequence); } else { result = sequence_input_function(sequence); } return result; } /* the code for the numeric_whatever_functions and for the numarray_whatever_functions should be source files. */ static void initfoo(void) { PyObject *m = Py_InitModule3( "foo", _foo_functions, _foo__doc__); if (m == NULL) return; import_numeric(); import_numarray(); } Regards -- Gerard

Todd Miller

12:10 p.m.

On Tue, 2004-06-29 at 13:44, Gerard Vermeulen wrote:

...

...
The PEP is attached. It is formatted using the docutils package which can be used to generate HTML or PDF. Comments and corrections would be appreciated.

PyQwt is a Python extension which can be conditionally compiled against Numeric and/or numarray (both, one of them or none).

Well that's cool! I'll have to keep the PyQwt guys in mind as potential first users.

...

Your PEP excludes importing of Numeric and numarray in the same C-extension.

This is true but I don't understand your solution so I'll blather on below.

...

All you need to do is to hide the macros PyArray_Present(), PyArray_isArray() and import_array() into a few functions with numarray specific names, so that the following becomes possible:

#include <Numeric/meta.h> /* defines the functions (no macros) int Numeric_Present(); int Numeric_isArray(); void import_numeric(); to hide the Numeric C-API stuff in a small Numeric/meta.c file. */ #include <numarray/meta.h> /* defines the functions (no macros) int numarray_Present(); int numarray_isArray(); void import_numarray(); to hide the numarray C-API stuff in a small numarray/meta.c file. */

I may have come up with the wrong scheme for the Present() and isArray(). With my scheme, they *have* to be macros because the API functions are unsafe: when numarray or Numeric is not present, the API function table pointers are NULL so calling through them results in either a fatal error or a segfault. There is an additional problem that the "same functions" need to be called through different API pointers depending on whether numarray or Numeric is being supported. Redefinition of typedefs and enumerations (or perhaps conditional compilation short-circuited re-definitions) may also present a problem with compiling (or worse, running). I certainly like the idea of supporting both in the same extension module, but don't see how to get there, other than with separate compilation units. With seperate .c files, I'm not aware of a problem other than lots of global symbols. I haven't demoed that yet so I am interested if someone has made it work. Thanks very much for commenting on the PEP. Regards, Todd

...

PyObject * some_array_returning_function(PyObject *m, PyObject *args) { int param; PyObject *result;

if (!PyArg_ParseTuple(args, "i", ¶m)) return NULL;

if (Numeric_Present()) { result = numeric_returning_function(param); } else if (Numarray_Present()) { result = numarray_returning_function(param); } else { result = list_returning_function(param); } return result; }

PyObject * some_array_accepting_function(PyObject *m, PyObject *args) { PyObject *sequence, *result;

if (!PyArg_ParseTuple(args, "O", &sequence)) return NULL;

if (Numeric_isArray(sequence)) { result = numeric_input_function(sequence); } else if (Numarray_isArray(sequence)) { result = numarray_input_function(sequence); } else { result = sequence_input_function(sequence); } return result; }

/* the code for the numeric_whatever_functions and for the numarray_whatever_functions should be source files. */

static void initfoo(void) { PyObject *m = Py_InitModule3( "foo", _foo_functions, _foo__doc__); if (m == NULL) return; import_numeric(); import_numarray(); }

Regards -- Gerard

------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- Todd Miller Space Telescope Science Institute 3700 San Martin Drive Baltimore MD, 21218 (410) 338-4576

gerard.vermeulen＠grenoble.cnrs.fr

2:33 p.m.

On 29 Jun 2004 15:09:43 -0400, Todd Miller wrote

...

On Tue, 2004-06-29 at 13:44, Gerard Vermeulen wrote:

...
...
The PEP is attached. It is formatted using the docutils package which can be used to generate HTML or PDF. Comments and corrections would be appreciated.

PyQwt is a Python extension which can be conditionally compiled against Numeric and/or numarray (both, one of them or none).

Well that's cool! I'll have to keep the PyQwt guys in mind as potential first users.

...
Your PEP excludes importing of Numeric and numarray in the same C-extension.

This is true but I don't understand your solution so I'll blather on below.

...
All you need to do is to hide the macros PyArray_Present(), PyArray_isArray() and import_array() into a few functions with numarray specific names, so that the following becomes possible:

#include <Numeric/meta.h> /* defines the functions (no macros) int Numeric_Present(); int Numeric_isArray(); void import_numeric(); to hide the Numeric C-API stuff in a small Numeric/meta.c file. */ #include <numarray/meta.h> /* defines the functions (no macros) int numarray_Present(); int numarray_isArray(); void import_numarray(); to hide the numarray C-API stuff in a small numarray/meta.c file. */

I may have come up with the wrong scheme for the Present() and isArray(). With my scheme, they *have* to be macros because the API functions are unsafe: when numarray or Numeric is not present, the API function table pointers are NULL so calling through them results in either a fatal error or a segfault.

The macros can be hidden from the module file scope by wrapping them in a function (see attached demo)

...

There is an additional problem that the "same functions" need to be called through different API pointers depending on whether numarray or Numeric is being supported. Redefinition of typedefs and enumerations

(or perhaps conditional compilation short-circuited re-definitions) may also present a problem with compiling (or worse, running).

Tested and works.

...

I certainly like the idea of supporting both in the same extension module, but don't see how to get there, other than with separate compilation units. With seperate .c files, I'm not aware of a problem other than lots of global symbols. I haven't demoed that yet so I am interested if someone has made it work.

Yes, you cannot mix the numarray API and Numeric API in the same .c file, but nevertheless you can hide the macros in small functions so that the macros don't pollute. Here you have a litte extension module 'pep' which tries to import Numeric and numarray. It has a method 'whatsThere' that prints the names of the Numerical/numarray extension modules it has imported. It has also a method 'whatsThis' that prints if an object is a Numeric array, a numarray array or something else: Python 2.3.3 (#2, Feb 17 2004, 11:45:40) [GCC 3.3.2 (Mandrake Linux 10.0 3.3.2-6mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...

...
...
import pep pep.whatsThere() Numeric! Numarray! import Numeric a = Numeric.arange(10) pep.whatsThis(a) Numeric array import numarray a = numarray.arange(10) pep.whatsThis(a) Numarray array a = 10 pep.whatsThis(a) Something else

The MANIFEST reads: pepmodule.c setup.py Numeric/meta.c Numeric/meta.h numarray/meta.c numarray/meta.h My initial idea was to add the meta.* files to the Numeric/numarray include files, but I did not try yet to solve the problem of the PyArray_API PY_ARRAY_UNIQUE_SYMBOL defines in the same way (I am sure it can be done. Regards -- Gerard

Todd Miller

2:43 p.m.

Awesome! Thanks Gerard. This certainly sounds like the way to do it. I'll take a closer look tomorrow. Regards, Todd On Tue, 2004-06-29 at 17:32, gerard.vermeulen@grenoble.cnrs.fr wrote:

...

On 29 Jun 2004 15:09:43 -0400, Todd Miller wrote

...
On Tue, 2004-06-29 at 13:44, Gerard Vermeulen wrote:

...
...
The PEP is attached. It is formatted using the docutils package which can be used to generate HTML or PDF. Comments and corrections would be appreciated.

PyQwt is a Python extension which can be conditionally compiled against Numeric and/or numarray (both, one of them or none).

Well that's cool! I'll have to keep the PyQwt guys in mind as potential first users.

...
Your PEP excludes importing of Numeric and numarray in the same C-extension.

This is true but I don't understand your solution so I'll blather on below.

...
All you need to do is to hide the macros PyArray_Present(), PyArray_isArray() and import_array() into a few functions with numarray specific names, so that the following becomes possible:

#include <Numeric/meta.h> /* defines the functions (no macros) int Numeric_Present(); int Numeric_isArray(); void import_numeric(); to hide the Numeric C-API stuff in a small Numeric/meta.c file. */ #include <numarray/meta.h> /* defines the functions (no macros) int numarray_Present(); int numarray_isArray(); void import_numarray(); to hide the numarray C-API stuff in a small numarray/meta.c file. */

I may have come up with the wrong scheme for the Present() and isArray(). With my scheme, they *have* to be macros because the API functions are unsafe: when numarray or Numeric is not present, the API function table pointers are NULL so calling through them results in either a fatal error or a segfault.

The macros can be hidden from the module file scope by wrapping them in a function (see attached demo)

...
There is an additional problem that the "same functions" need to be called through different API pointers depending on whether numarray or Numeric is being supported. Redefinition of typedefs and enumerations

(or perhaps conditional compilation short-circuited re-definitions) may also present a problem with compiling (or worse, running).

Tested and works.

...
I certainly like the idea of supporting both in the same extension module, but don't see how to get there, other than with separate compilation units. With seperate .c files, I'm not aware of a problem other than lots of global symbols. I haven't demoed that yet so I am interested if someone has made it work.

Yes, you cannot mix the numarray API and Numeric API in the same .c file, but nevertheless you can hide the macros in small functions so that the macros don't pollute.

Here you have a litte extension module 'pep' which tries to import Numeric and numarray. It has a method 'whatsThere' that prints the names of the Numerical/numarray extension modules it has imported.

It has also a method 'whatsThis' that prints if an object is a Numeric array, a numarray array or something else:

Python 2.3.3 (#2, Feb 17 2004, 11:45:40) [GCC 3.3.2 (Mandrake Linux 10.0 3.3.2-6mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...
...
...
import pep pep.whatsThere() Numeric! Numarray! import Numeric a = Numeric.arange(10) pep.whatsThis(a) Numeric array import numarray a = numarray.arange(10) pep.whatsThis(a) Numarray array a = 10 pep.whatsThis(a) Something else

The MANIFEST reads: pepmodule.c setup.py Numeric/meta.c Numeric/meta.h numarray/meta.c numarray/meta.h

My initial idea was to add the meta.* files to the Numeric/numarray include files, but I did not try yet to solve the problem of the PyArray_API PY_ARRAY_UNIQUE_SYMBOL defines in the same way (I am sure it can be done.

Regards -- Gerard --

Todd Miller

2:55 p.m.

On Tue, 2004-06-29 at 17:32, gerard.vermeulen@grenoble.cnrs.fr wrote:

...

On 29 Jun 2004 15:09:43 -0400, Todd Miller wrote

...
On Tue, 2004-06-29 at 13:44, Gerard Vermeulen wrote:

...
...
The PEP is attached. It is formatted using the docutils package which can be used to generate HTML or PDF. Comments and corrections would be appreciated.

PyQwt is a Python extension which can be conditionally compiled against Numeric and/or numarray (both, one of them or none).

Well that's cool! I'll have to keep the PyQwt guys in mind as potential first users.

...
Your PEP excludes importing of Numeric and numarray in the same C-extension.

This is true but I don't understand your solution so I'll blather on below.

...
All you need to do is to hide the macros PyArray_Present(), PyArray_isArray() and import_array() into a few functions with numarray specific names, so that the following becomes possible:

#include <Numeric/meta.h> /* defines the functions (no macros) int Numeric_Present(); int Numeric_isArray(); void import_numeric(); to hide the Numeric C-API stuff in a small Numeric/meta.c file. */ #include <numarray/meta.h> /* defines the functions (no macros) int numarray_Present(); int numarray_isArray(); void import_numarray(); to hide the numarray C-API stuff in a small numarray/meta.c file. */

I may have come up with the wrong scheme for the Present() and isArray(). With my scheme, they *have* to be macros because the API functions are unsafe: when numarray or Numeric is not present, the API function table pointers are NULL so calling through them results in either a fatal error or a segfault.

The macros can be hidden from the module file scope by wrapping them in a function (see attached demo)

Your demo is very clear... nice!

...

...
There is an additional problem that the "same functions" need to be called through different API pointers depending on whether numarray or Numeric is being supported. Redefinition of typedefs and enumerations

(or perhaps conditional compilation short-circuited re-definitions) may also present a problem with compiling (or worse, running).

Tested and works.

...
I certainly like the idea of supporting both in the same extension module, but don't see how to get there, other than with separate compilation units. With seperate .c files, I'm not aware of a problem other than lots of global symbols. I haven't demoed that yet so I am interested if someone has made it work.

Yes, you cannot mix the numarray API and Numeric API in the same .c file, but nevertheless you can hide the macros in small functions so that the macros don't pollute.

So... you use the "meta" code to provide package specific ordinary (not-macro-fied) functions to keep the different versions of the Present() and isArray() macros from conflicting. It would be nice to have a standard approach for using the same "extension enhancement code" for both numarray and Numeric. The PEP should really be expanded to provide an example of dual support for one complete and real function, guts and all, so people can see the process end-to-end; Something like a simple arrayprint. That process needs to be refined to remove as much tedium and duplication of effort as possible. The idea is to make it as close to providing one implementation to support both array packages as possible. I think it's important to illustrate how to partition the extension module into separate compilation units which correctly navigate the dual implementation mine field in the easiest possible way. It would also be nice to add some logic to the meta-functions so that which array package gets used is configurable. We did something like that for the matplotlib plotting software at the Python level with the "numerix" layer, an idea I think we copied from Chaco. The kind of dispatch I think might be good to support configurability looks like this: PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(PyArray_Which(what)) { USE_NUMERIC: result = Numeric_whatsThis(what); break; USE_NUMARRAY: result = Numarray_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; } In the above, I'm picturing a separate .c file for Numeric_whatsThis and for Numarray_whatsThis. It would be nice to streamline that to one .c and a process which somehow (simply) produces both functions. Or, ideally, the above would be done more like this: PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(Numerix_Which(what)) { USE_NUMERIX: result = Numerix_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; } Here, a common Numerix implementation supports both numarray and Numeric from a single simple .c. The extension module would do "#include numerix/arrayobject.h" and "import_numerix()" and otherwise just call PyArray_* functions. The current stumbling block is that numarray is not binary compatible with Numeric... so numerix in C falls apart. I haven't analyzed every symbol and struct to see if it is really feasible... but it seems like it is *almost* feasible, at least for typical usage. So, in a nutshell, I think the dual implementation support you demoed is important and we should work up an example and kick it around to make sure it's the best way we can think of doing it. Then we should add a section to the PEP describing dual support as well. Regards, Todd

gerard.vermeulen＠grenoble.cnrs.fr

11:34 p.m.

On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote

...

So... you use the "meta" code to provide package specific ordinary (not-macro-fied) functions to keep the different versions of the Present() and isArray() macros from conflicting.

It would be nice to have a standard approach for using the same "extension enhancement code" for both numarray and Numeric. The PEP should really be expanded to provide an example of dual support for one complete and real function, guts and all, so people can see the process end-to-end; Something like a simple arrayprint. That process needs to be refined to remove as much tedium and duplication of effort as possible. The idea is to make it as close to providing one implementation to support both array packages as possible. I think it's important to illustrate how to partition the extension module into separate compilation units which correctly navigate the dual implementation mine field in the easiest possible way.

It would also be nice to add some logic to the meta-functions so that which array package gets used is configurable. We did something like that for the matplotlib plotting software at the Python level with the "numerix" layer, an idea I think we copied from Chaco. The kind of dispatch I think might be good to support configurability looks like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(PyArray_Which(what)) { USE_NUMERIC: result = Numeric_whatsThis(what); break; USE_NUMARRAY: result = Numarray_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

In the above, I'm picturing a separate .c file for Numeric_whatsThis and for Numarray_whatsThis. It would be nice to streamline that to one .c and a process which somehow (simply) produces both functions.

Or, ideally, the above would be done more like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(Numerix_Which(what)) { USE_NUMERIX: result = Numerix_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

Here, a common Numerix implementation supports both numarray and Numeric from a single simple .c. The extension module would do "#include numerix/arrayobject.h" and "import_numerix()" and otherwise just call PyArray_* functions.

The current stumbling block is that numarray is not binary compatible with Numeric... so numerix in C falls apart. I haven't analyzed every symbol and struct to see if it is really feasible... but it seems like it is *almost* feasible, at least for typical usage.

So, in a nutshell, I think the dual implementation support you demoed is important and we should work up an example and kick it around to make sure it's the best way we can think of doing it. Then we should add a section to the PEP describing dual support as well.

I would never apply numarray code to Numeric arrays and the inverse. It looks dangerous and I do not know if it is possible. The first thing coming to mind is that numarray and Numeric arrays refer to different type objects (this is what my pep module uses to differentiate them). So, even if numarray and Numeric are binary compatible, any 'alien' code referring the the 'Python-standard part' of the type objects may lead to surprises. A PEP proposing hacks will raise eyebrows at least. Secondly, most people use Numeric *or* numarray and not both. So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) Of course, Numeric or numarray output can be a user option if NINO does not apply. (explicit safe conversion between Numeric and numarray is possible if really needed). I'll try to flesh out the demo with real functions in the way you indicated (going as far as I consider safe). The problem of coding the Numeric (or numarray) functions in more than a single source file has also be addressed. It may take 2 weeks because I am off to a conference next week. Regards -- Gerard

Sebastian Haase

July 2004

9:06 a.m.

On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen@grenoble.cnrs.fr wrote:

...

On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote

...
So... you use the "meta" code to provide package specific ordinary (not-macro-fied) functions to keep the different versions of the Present() and isArray() macros from conflicting.

It would be nice to have a standard approach for using the same "extension enhancement code" for both numarray and Numeric. The PEP should really be expanded to provide an example of dual support for one complete and real function, guts and all, so people can see the process end-to-end; Something like a simple arrayprint. That process needs to be refined to remove as much tedium and duplication of effort as possible. The idea is to make it as close to providing one implementation to support both array packages as possible. I think it's important to illustrate how to partition the extension module into separate compilation units which correctly navigate the dual implementation mine field in the easiest possible way.

It would also be nice to add some logic to the meta-functions so that which array package gets used is configurable. We did something like that for the matplotlib plotting software at the Python level with the "numerix" layer, an idea I think we copied from Chaco. The kind of dispatch I think might be good to support configurability looks like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(PyArray_Which(what)) { USE_NUMERIC: result = Numeric_whatsThis(what); break; USE_NUMARRAY: result = Numarray_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

In the above, I'm picturing a separate .c file for Numeric_whatsThis and for Numarray_whatsThis. It would be nice to streamline that to one .c and a process which somehow (simply) produces both functions.

Or, ideally, the above would be done more like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(Numerix_Which(what)) { USE_NUMERIX: result = Numerix_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

Here, a common Numerix implementation supports both numarray and Numeric from a single simple .c. The extension module would do "#include numerix/arrayobject.h" and "import_numerix()" and otherwise just call PyArray_* functions.

The current stumbling block is that numarray is not binary compatible with Numeric... so numerix in C falls apart. I haven't analyzed every symbol and struct to see if it is really feasible... but it seems like it is *almost* feasible, at least for typical usage.

So, in a nutshell, I think the dual implementation support you demoed is important and we should work up an example and kick it around to make sure it's the best way we can think of doing it. Then we should add a section to the PEP describing dual support as well.

I would never apply numarray code to Numeric arrays and the inverse. It looks dangerous and I do not know if it is possible. The first thing coming to mind is that numarray and Numeric arrays refer to different type objects (this is what my pep module uses to differentiate them). So, even if numarray and Numeric are binary compatible, any 'alien' code referring the the 'Python-standard part' of the type objects may lead to surprises. A PEP proposing hacks will raise eyebrows at least.

Secondly, most people use Numeric *or* numarray and not both.

So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) Of course, Numeric or numarray output can be a user option if NINO does not apply. (explicit safe conversion between Numeric and numarray is possible if really needed).

I'll try to flesh out the demo with real functions in the way you indicated (going as far as I consider safe).

The problem of coding the Numeric (or numarray) functions in more than a single source file has also be addressed.

It may take 2 weeks because I am off to a conference next week.

Regards -- Gerard

Hi all, first, I would like to state that I don't understand much of this discussion; so the only comment I wanted to make is that IF this where possible, to make (C/C++) code that can live with both Numeric and numarray, then I think it would be used more and more - think: transition phase !! (e.g. someone could start making the FFTW part of scipy numarray friendly without having to switch everything at one [hint ;-)] ) These where just my 2 cents. Cheers, Sebastian Haase

Colin J. Williams

12:59 p.m.

Sebastian Haase wrote:

...

On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen@grenoble.cnrs.fr wrote:

...
On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote

...
So... you use the "meta" code to provide package specific ordinary (not-macro-fied) functions to keep the different versions of the Present() and isArray() macros from conflicting.

It would be nice to have a standard approach for using the same "extension enhancement code" for both numarray and Numeric. The PEP should really be expanded to provide an example of dual support for one complete and real function, guts and all, so people can see the process end-to-end; Something like a simple arrayprint. That process needs to be refined to remove as much tedium and duplication of effort as possible. The idea is to make it as close to providing one implementation to support both array packages as possible. I think it's important to illustrate how to partition the extension module into separate compilation units which correctly navigate the dual implementation mine field in the easiest possible way.

It would also be nice to add some logic to the meta-functions so that which array package gets used is configurable. We did something like that for the matplotlib plotting software at the Python level with the "numerix" layer, an idea I think we copied from Chaco. The kind of dispatch I think might be good to support configurability looks like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(PyArray_Which(what)) { USE_NUMERIC: result = Numeric_whatsThis(what); break; USE_NUMARRAY: result = Numarray_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

In the above, I'm picturing a separate .c file for Numeric_whatsThis and for Numarray_whatsThis. It would be nice to streamline that to one .c and a process which somehow (simply) produces both functions.

Or, ideally, the above would be done more like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(Numerix_Which(what)) { USE_NUMERIX: result = Numerix_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

Here, a common Numerix implementation supports both numarray and Numeric from a single simple .c. The extension module would do "#include numerix/arrayobject.h" and "import_numerix()" and otherwise just call PyArray_* functions.

The current stumbling block is that numarray is not binary compatible with Numeric... so numerix in C falls apart. I haven't analyzed every symbol and struct to see if it is really feasible... but it seems like it is *almost* feasible, at least for typical usage.

So, in a nutshell, I think the dual implementation support you demoed is important and we should work up an example and kick it around to make sure it's the best way we can think of doing it. Then we should add a section to the PEP describing dual support as well.

I would never apply numarray code to Numeric arrays and the inverse. It looks dangerous and I do not know if it is possible. The first thing coming to mind is that numarray and Numeric arrays refer to different type objects (this is what my pep module uses to differentiate them). So, even if numarray and Numeric are binary compatible, any 'alien' code referring the the 'Python-standard part' of the type objects may lead to surprises. A PEP proposing hacks will raise eyebrows at least.

Secondly, most people use Numeric *or* numarray and not both.

So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) Of course, Numeric or numarray output can be a user option if NINO does not apply. (explicit safe conversion between Numeric and numarray is possible if really needed).

I'll try to flesh out the demo with real functions in the way you indicated (going as far as I consider safe).

The problem of coding the Numeric (or numarray) functions in more than a single source file has also be addressed.

It may take 2 weeks because I am off to a conference next week.

Regards -- Gerard

Hi all, first, I would like to state that I don't understand much of this discussion; so the only comment I wanted to make is that IF this where possible, to make (C/C++) code that can live with both Numeric and numarray, then I think it would be used more and more - think: transition phase !! (e.g. someone could start making the FFTW part of scipy numarray friendly without having to switch everything at one [hint ;-)] )

These where just my 2 cents. Cheers, Sebastian Haase

I feel lower on the understanding tree with respect to what is being proposed in the draft PEP, but would still like to offer my 2 cents worth. I get the feeling that numarray is being bent out of shape to fit Numeric. It was my understanding that Numeric had certain weakness which made it unacceptable as a Python component and that numarray was intended to provide the same or better functionality within a pythonic framework. numarray has not achieved the expected performance level to date, but progress is being made and I believe that, for larger arrays, numarray has been shown to be be superior to Numeric - please correct me if I'm wrong here. The shock came for me when Todd Miller said: <> I looked at this some, and while INCREFing __dict__ maybe the right idea, I forgot that there *is no* Python NumArray.__init__ anymore. Wasn't it the intent of numarray to work towards the full use of the Python class structure to provide the benefits which it offers? The Python class has two constructors and one destructor. The constructors are __init__ and __new__, the latter only provides the shell of an instance which later has to be initialized. In version 0.9, which I use, there is no __new__, but there is a new function which has a functionality similar to that intended for __new__. Thus, with this change, numarray appears to be moving further away from being pythonic. Colin W

gerard.vermeulen＠grenoble.cnrs.fr

1:40 p.m.

On Thu, 01 Jul 2004 15:58:11 -0400, Colin J. Williams wrote

...

Sebastian Haase wrote:

...
On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen@grenoble.cnrs.fr wrote:

...
On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote

...
So... you use the "meta" code to provide package specific ordinary (not-macro-fied) functions to keep the different versions of the Present() and isArray() macros from conflicting.

It would be nice to have a standard approach for using the same "extension enhancement code" for both numarray and Numeric. The PEP should really be expanded to provide an example of dual support for one complete and real function, guts and all, so people can see the process end-to-end; Something like a simple arrayprint. That process needs to be refined to remove as much tedium and duplication of effort as possible. The idea is to make it as close to providing one implementation to support both array packages as possible. I think it's important to illustrate how to partition the extension module into separate compilation units which correctly navigate the dual implementation mine field in the easiest possible way.

It would also be nice to add some logic to the meta-functions so that which array package gets used is configurable. We did something like that for the matplotlib plotting software at the Python level with the "numerix" layer, an idea I think we copied from Chaco. The kind of dispatch I think might be good to support configurability looks like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(PyArray_Which(what)) { USE_NUMERIC: result = Numeric_whatsThis(what); break; USE_NUMARRAY: result = Numarray_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

In the above, I'm picturing a separate .c file for Numeric_whatsThis and for Numarray_whatsThis. It would be nice to streamline that to one .c and a process which somehow (simply) produces both functions.

Or, ideally, the above would be done more like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(Numerix_Which(what)) { USE_NUMERIX: result = Numerix_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

Here, a common Numerix implementation supports both numarray and Numeric from a single simple .c. The extension module would do "#include numerix/arrayobject.h" and "import_numerix()" and otherwise just call PyArray_* functions.

The current stumbling block is that numarray is not binary compatible with Numeric... so numerix in C falls apart. I haven't analyzed every symbol and struct to see if it is really feasible... but it seems like it is *almost* feasible, at least for typical usage.

So, in a nutshell, I think the dual implementation support you demoed is important and we should work up an example and kick it around to make sure it's the best way we can think of doing it. Then we should add a section to the PEP describing dual support as well.

I would never apply numarray code to Numeric arrays and the inverse. It looks dangerous and I do not know if it is possible. The first thing coming to mind is that numarray and Numeric arrays refer to different type objects (this is what my pep module uses to differentiate them). So, even if numarray and Numeric are binary compatible, any 'alien' code referring the the 'Python-standard part' of the type objects may lead to surprises. A PEP proposing hacks will raise eyebrows at least.

Secondly, most people use Numeric *or* numarray and not both.

So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) Of course, Numeric or numarray output can be a user option if NINO does not apply. (explicit safe conversion between Numeric and numarray is possible if really needed).

I'll try to flesh out the demo with real functions in the way you indicated (going as far as I consider safe).

The problem of coding the Numeric (or numarray) functions in more than a single source file has also be addressed.

It may take 2 weeks because I am off to a conference next week.

Regards -- Gerard

Hi all, first, I would like to state that I don't understand much of this discussion; so the only comment I wanted to make is that IF this where possible, to make (C/C++) code that can live with both Numeric and numarray, then I think it would be used more and more - think: transition phase !! (e.g. someone could start making the FFTW part of scipy numarray friendly without having to switch everything at one [hint ;-)] )

These where just my 2 cents. Cheers, Sebastian Haase

I feel lower on the understanding tree with respect to what is being proposed in the draft PEP, but would still like to offer my 2 cents worth. I get the feeling that numarray is being bent out of shape to fit Numeric.

What we are discussing are methods to make it possible to import Numeric and numarray in the same extension module. This can be done by separating the colliding APIs of Numeric and numarray in separate *.c files. To achieve this, no changes to Numeric and numarray itself are necessary. In fact, this can be done by the author of the C-extension himself, but since it is not obvious we discuss the best methods and we like to provide the necessary glue code. It will make life easier for extension writers and facilitate the transition to numarray. Try to look at the problem from the other side: I am using Numeric (since my life depends on SciPy) but have written an extension that can also import numarray (hoping to get more users). I will never use the methods proposed in the draft PEP, because it excludes importing Numeric.

...

It was my understanding that Numeric had certain weakness which made it unacceptable as a Python component and that numarray was intended to provide the same or better functionality within a pythonic framework.

numarray has not achieved the expected performance level to date, but progress is being made and I believe that, for larger arrays, numarray has been shown to be be superior to Numeric - please correct me if I'm wrong here.

I think you are correct. I don't know why the __init__ has disappeared, but I don't think it is because of the PEP and certainly not because of the thread.

...

The shock came for me when Todd Miller said:

<> I looked at this some, and while INCREFing __dict__ maybe the right idea, I forgot that there *is no* Python NumArray.__init__ anymore.

Wasn't it the intent of numarray to work towards the full use of the Python class structure to provide the benefits which it offers?

The Python class has two constructors and one destructor.

The constructors are __init__ and __new__, the latter only provides the shell of an instance which later has to be initialized. In version 0.9, which I use, there is no __new__, but there is a new function which has a functionality similar to that intended for __new__. Thus, with this change, numarray appears to be moving further away from being pythonic.

Gerard

Todd Miller

1:46 p.m.

On Thu, 2004-07-01 at 15:58, Colin J. Williams wrote:

...

Sebastian Haase wrote:

...
On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen@grenoble.cnrs.fr wrote:

...
On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote

...
So... you use the "meta" code to provide package specific ordinary (not-macro-fied) functions to keep the different versions of the Present() and isArray() macros from conflicting.

It would be nice to have a standard approach for using the same "extension enhancement code" for both numarray and Numeric. The PEP should really be expanded to provide an example of dual support for one complete and real function, guts and all, so people can see the process end-to-end; Something like a simple arrayprint. That process needs to be refined to remove as much tedium and duplication of effort as possible. The idea is to make it as close to providing one implementation to support both array packages as possible. I think it's important to illustrate how to partition the extension module into separate compilation units which correctly navigate the dual implementation mine field in the easiest possible way.

It would also be nice to add some logic to the meta-functions so that which array package gets used is configurable. We did something like that for the matplotlib plotting software at the Python level with the "numerix" layer, an idea I think we copied from Chaco. The kind of dispatch I think might be good to support configurability looks like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(PyArray_Which(what)) { USE_NUMERIC: result = Numeric_whatsThis(what); break; USE_NUMARRAY: result = Numarray_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

In the above, I'm picturing a separate .c file for Numeric_whatsThis and for Numarray_whatsThis. It would be nice to streamline that to one .c and a process which somehow (simply) produces both functions.

Or, ideally, the above would be done more like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(Numerix_Which(what)) { USE_NUMERIX: result = Numerix_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

Here, a common Numerix implementation supports both numarray and Numeric from a single simple .c. The extension module would do "#include numerix/arrayobject.h" and "import_numerix()" and otherwise just call PyArray_* functions.

The current stumbling block is that numarray is not binary compatible with Numeric... so numerix in C falls apart. I haven't analyzed every symbol and struct to see if it is really feasible... but it seems like it is *almost* feasible, at least for typical usage.

So, in a nutshell, I think the dual implementation support you demoed is important and we should work up an example and kick it around to make sure it's the best way we can think of doing it. Then we should add a section to the PEP describing dual support as well.

I would never apply numarray code to Numeric arrays and the inverse. It looks dangerous and I do not know if it is possible. The first thing coming to mind is that numarray and Numeric arrays refer to different type objects (this is what my pep module uses to differentiate them). So, even if numarray and Numeric are binary compatible, any 'alien' code referring the the 'Python-standard part' of the type objects may lead to surprises. A PEP proposing hacks will raise eyebrows at least.

Secondly, most people use Numeric *or* numarray and not both.

So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) Of course, Numeric or numarray output can be a user option if NINO does not apply. (explicit safe conversion between Numeric and numarray is possible if really needed).

I'll try to flesh out the demo with real functions in the way you indicated (going as far as I consider safe).

The problem of coding the Numeric (or numarray) functions in more than a single source file has also be addressed.

It may take 2 weeks because I am off to a conference next week.

Regards -- Gerard

Hi all, first, I would like to state that I don't understand much of this discussion; so the only comment I wanted to make is that IF this where possible, to make (C/C++) code that can live with both Numeric and numarray, then I think it would be used more and more - think: transition phase !! (e.g. someone could start making the FFTW part of scipy numarray friendly without having to switch everything at one [hint ;-)] )

These where just my 2 cents. Cheers, Sebastian Haase

I feel lower on the understanding tree with respect to what is being proposed in the draft PEP, but would still like to offer my 2 cents worth. I get the feeling that numarray is being bent out of shape to fit Numeric.

Yes and no. The numarray team has over time realized the importance of backward compatibility with the dominant array package, Numeric. A lot of People use Numeric now. We're trying to make it as easy as possible to use numarray.

...

It was my understanding that Numeric had certain weakness which made it unacceptable as a Python component and that numarray was intended to provide the same or better functionality within a pythonic framework.

My understanding is that until there is a consensus on an array package, neither numarray nor Numeric is going into the Python core.

...

numarray has not achieved the expected performance level to date, but progress is being made and I believe that, for larger arrays, numarray has been shown to be be superior to Numeric - please correct me if I'm wrong here.

I think that's a fair summary.

...

The shock came for me when Todd Miller said: <> I looked at this some, and while INCREFing __dict__ maybe the right idea, I forgot that there *is no* Python NumArray.__init__ anymore.

Wasn't it the intent of numarray to work towards the full use of the Python class structure to provide the benefits which it offers?

Ack. I wasn't trying to start a panic. The __init__ still exists, as does __new__, they're just in C. Sorry if I was unclear.

...

The Python class has two constructors and one destructor.

We're mostly on the same page.

...

The constructors are __init__ and __new__, the latter only provides the shell of an instance which later has to be initialized. In version 0.9, which I use, there is no __new__,

It's there, but it's not very useful:

...

...
...
import numarray numarray.NumArray.__new__ <built-in method __new__ of type object at 0x402fc860> a = numarray.NumArray.__new__(numarray.NumArray) a.info() class: <class 'numarray.numarraycore.NumArray'> shape: () strides: () byteoffset: 0 bytestride: 0 itemsize: 0 aligned: 1 contiguous: 1 data: None byteorder: little byteswap: 0 type: Any

I don't, however, recommend doing this.

...

but there is a new function which has a functionality similar to that intended for __new__. Thus, with this change, numarray appears to be moving further away from being pythonic.

Nope. I'm talking about moving toward better speed with no change in functionality at the Python level. I also think maybe we've gotten list threads crossed here: the "Numarray header PEP" thread is independent (but admittedly related) of the "Speeding up wxPython/numarray" thread. The Numarray header PEP is about making it easy for packages to write C extensions which *optionally* support numarray (and now Numeric as well). One aspect of the PEP is getting headers included in the Python core so that extensions can be compiled even when the numarray is not installed. The other aspect will be illustrating a good technique for supporting both numarray and Numeric, optionally and with choice, at the same time. Such an extension would still run where there is numarray, Numeric, both, or none installed. Gerard V. has already done some integration of numarray and Numeric with PyQwt so he has a few good ideas on how to do the "good technique" aspect of the PEP. The Speeding up wxPython/numarray thread is about improving the performance of a 50000 point wxPython drawlines which is 10x slower with numarray than Numeric. Tim H. and Chris B. have nailed this down (mostly) to the numarray sequence protocol and destructor, __del__. Regards, Todd

Perry Greenfield

1:57 p.m.

Collin J. Williams Wrote:

...

I feel lower on the understanding tree with respect to what is being proposed in the draft PEP, but would still like to offer my 2 cents worth. I get the feeling that numarray is being bent out of shape to fit Numeric.

Todd and Gerard address this point well.

...

It was my understanding that Numeric had certain weakness which made it unacceptable as a Python component and that numarray was intended to provide the same or better functionality within a pythonic framework.

Let me reiterate what our motivations were. We wanted to use an array package for our software, and Numeric had enough shortcomings that we needed some changes in behavior (e.g., type coercion for scalars), changes in performance (particularly with regard to memory usage), and enhancements in capabilities (e.g., memory mapping, record arrays, etc.). It was the opinion of some (Paul Dubois, for example) that a rewrite was in order in any case since the code was not that maintainable (not everyone felt this way, though at the time that wasn't as clear). At the same time there was some hope that Numeric could be accepted into the standard Python distribution. That's something we thought would be good (but wasn't the highest priority for us) and I've come to believe that perhaps a better solution with regard to that is what this PEP is trying to address. In any case Guido made it clear that he would not accept Numeric in its (then) current form. That it be written mostly in Python was something suggested by Guido, and we started off that way, mainly because it would get us going much faster than writing it all in C. We definitely understood that it would also have the consequence of making small array performance worse. We said as much when we started; it wasn't as clear as it is now that many users objected to a factor of few slower performance (as it turned out, a mostly Python based implemenation was more than an order of magnitude slower for small arrays).

...

numarray has not achieved the expected performance level to date, but progress is being made and I believe that, for larger arrays, numarray has been shown to be be superior to Numeric - please correct me if I'm wrong here.

We never expected numarray to ever reach the performance level for small arrays that Numeric has. If it were within a factor of two I would be thrilled (its more like a factor of 3 or 4 currently for simple ufuncs). I still don't think it ever will be as fast for small arrays. The focus all along was on handling large arrays, which I think it does quite well, both regard to memory and speed. Yes, there are some functions and operations that may be much slower. Mainly they need to be called out so they can be improved. Generally we only notice performance issues that affect our software. Others need to point out remaining large discrepancies. I'm still of the opinion that if small array performance is really important, a very different approach should be used and have a completely different implementation. I would think that improvements of an order of magnitude over what Numeric does now are possible. But since that isn't important to us (STScI), don't expect us to work on that :-)

...

The shock came for me when Todd Miller said:

<> I looked at this some, and while INCREFing __dict__ maybe the right idea, I forgot that there *is no* Python NumArray.__init__ anymore.

Wasn't it the intent of numarray to work towards the full use of the Python class structure to provide the benefits which it offers?

The Python class has two constructors and one destructor.

The constructors are __init__ and __new__, the latter only provides the shell of an instance which later has to be initialized. In version 0.9, which I use, there is no __new__, but there is a new function which has a functionality similar to that intended for __new__. Thus, with this change, numarray appears to be moving further away from being pythonic.

I'll agree that optimization is driving the underlying implementation to one that is more complex and that is the drawback (no surprise there). There's Pythonic in use and Pythonic in implementation. We are certainly receptive to better ideas for the implementation, but I doubt that a heavily Python-based implementation is ever going to be competitive for small arrays (unless something like psyco become universal, but I think there are a whole mess of problems to be solved for that kind of approach to work well generically). Perry

Todd Miller

9:44 a.m.

On Thu, 2004-07-01 at 02:33, gerard.vermeulen@grenoble.cnrs.fr wrote:

...

On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote

...
So... you use the "meta" code to provide package specific ordinary (not-macro-fied) functions to keep the different versions of the Present() and isArray() macros from conflicting.

It would be nice to have a standard approach for using the same "extension enhancement code" for both numarray and Numeric. The PEP should really be expanded to provide an example of dual support for one complete and real function, guts and all, so people can see the process end-to-end; Something like a simple arrayprint. That process needs to be refined to remove as much tedium and duplication of effort as possible. The idea is to make it as close to providing one implementation to support both array packages as possible. I think it's important to illustrate how to partition the extension module into separate compilation units which correctly navigate the dual implementation mine field in the easiest possible way.

It would also be nice to add some logic to the meta-functions so that which array package gets used is configurable. We did something like that for the matplotlib plotting software at the Python level with the "numerix" layer, an idea I think we copied from Chaco. The kind of dispatch I think might be good to support configurability looks like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(PyArray_Which(what)) { USE_NUMERIC: result = Numeric_whatsThis(what); break; USE_NUMARRAY: result = Numarray_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

In the above, I'm picturing a separate .c file for Numeric_whatsThis and for Numarray_whatsThis. It would be nice to streamline that to one .c and a process which somehow (simply) produces both functions.

Or, ideally, the above would be done more like this:

PyObject * whatsThis(PyObject *dummy, PyObject *args) { PyObject *result, *what = NULL; if (!PyArg_ParseTuple(args, "O", &what)) return 0; switch(Numerix_Which(what)) { USE_NUMERIX: result = Numerix_whatsThis(what); break; USE_SEQUENCE: result = Sequence_whatsThis(what); break; } Py_INCREF(Py_None); return Py_None; }

Here, a common Numerix implementation supports both numarray and Numeric from a single simple .c. The extension module would do "#include numerix/arrayobject.h" and "import_numerix()" and otherwise just call PyArray_* functions.

The current stumbling block is that numarray is not binary compatible with Numeric... so numerix in C falls apart. I haven't analyzed every symbol and struct to see if it is really feasible... but it seems like it is *almost* feasible, at least for typical usage.

So, in a nutshell, I think the dual implementation support you demoed is important and we should work up an example and kick it around to make sure it's the best way we can think of doing it. Then we should add a section to the PEP describing dual support as well.

I would never apply numarray code to Numeric arrays and the inverse. It looks dangerous and I do not know if it is possible.

I think that's definitely the marching orders for now... but you gotta admit, it would be nice.

...

The first thing coming to mind is that numarray and Numeric arrays refer to different type objects (this is what my pep module uses to differentiate them). So, even if numarray and Numeric are binary compatible, any 'alien' code referring the the 'Python-standard part' of the type objects may lead to surprises. A PEP proposing hacks will raise eyebrows at least.

I'm a little surprised it took someone to talk me out of it... I'll just concede that this was probably a bad idea.

...

Secondly, most people use Numeric *or* numarray and not both.

A class of question which will arise for developers is this: "X works with Numeric, but X doesn't work with numaray." The reverse also happens occasionally. For this reason, being able to choose would be nice for developers.

...

So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) Of course, Numeric or numarray output can be a user option if NINO does not apply.

When I first heard it, I though NINO was a good idea, with the limitation that it doesn't apply when a function produces an array without consuming any. But... there is another problem with NINO that Perry Greenfield pointed out: with multiple arguments, there can be a mix of array types. For this reason, it makes sense to be able to coerce all the inputs to a particular array package. This form might look more like: switch(PyArray_Which(<no_parameter_at_all!>)) { case USE_NUMERIC: result = Numeric_doit(a1, a2, a3); break; case USE_NUMARRAY: result = Numarray_doit(a1, a2, a3); break; case USE_SEQUENCE: result = Sequence_doit(a1, a2, a3); break; } One last thing: I think it would be useful to be able to drive the code into sequence mode with arrays. This would enable easy benchmarking of the performance improvement.

...

(explicit safe conversion between Numeric and numarray is possible if really needed).

I'll try to flesh out the demo with real functions in the way you indicated (going as far as I consider safe).

The problem of coding the Numeric (or numarray) functions in more than a single source file has also be addressed.

It may take 2 weeks because I am off to a conference next week.

Excellent. See you in a couple weeks. Regards, Todd

Gerard Vermeulen

11:39 a.m.

On 01 Jul 2004 12:43:31 -0400 Todd Miller <jmiller@stsci.edu> wrote:

...

A class of question which will arise for developers is this: "X works with Numeric, but X doesn't work with numaray." The reverse also happens occasionally. For this reason, being able to choose would be nice for developers.

...
So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) Of course, Numeric or numarray output can be a user option if NINO does not apply.

When I first heard it, I though NINO was a good idea, with the limitation that it doesn't apply when a function produces an array without consuming any. But... there is another problem with NINO that Perry Greenfield pointed out: with multiple arguments, there can be a mix of array types. For this reason, it makes sense to be able to coerce all the inputs to a particular array package. This form might look more like:

switch(PyArray_Which(<no_parameter_at_all!>)) { case USE_NUMERIC: result = Numeric_doit(a1, a2, a3); break; case USE_NUMARRAY: result = Numarray_doit(a1, a2, a3); break; case USE_SEQUENCE: result = Sequence_doit(a1, a2, a3); break; }

One last thing: I think it would be useful to be able to drive the code into sequence mode with arrays. This would enable easy benchmarking of the performance improvement.

...
(explicit safe conversion between Numeric and numarray is possible if really needed).

Yeah, when I wrote 'if really needed', I was hoping to shift the responsibility of coercion (or conversion) to the Python programmer (my lazy side telling me that it can be done in pure Python). You talked me into doing it in C :-) Regards -- Gerard

Chris Barker

1:18 p.m.

New subject: How to read data from text files fast?

Hi all, I'm looking for a way to read data from ascii text files quickly. I've found that using the standard python idioms like: data = array((M,N),Float) for in range(N): data.append(map(float,file.readline().split())) Can be pretty slow. What I'd like is something like Matlab's fscanf: data = fscanf(file, "%g", [M,N] ) I may have the syntax a little wrong, but the gist is there. What Matlab does keep recycling the format string until the desired number of elements have been read. It is quite flexible, and ends up being pretty fast. Has anyone written something like this for Numeric (or numarray, but I'd prefer Numeric at this point) ? I was surprised not to find something like this in SciPy, maybe I didn't look hard enough. If no one has done this, I guess I'll get started on it.... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Fernando Perez

1:28 p.m.

New subject: How to read data from text files fast?

Chris Barker wrote:

...

Hi all,

I'm looking for a way to read data from ascii text files quickly. I've found that using the standard python idioms like:

data = array((M,N),Float) for in range(N): data.append(map(float,file.readline().split()))

Can be pretty slow. What I'd like is something like Matlab's fscanf:

data = fscanf(file, "%g", [M,N] )

I may have the syntax a little wrong, but the gist is there. What Matlab does keep recycling the format string until the desired number of elements have been read.

It is quite flexible, and ends up being pretty fast.

Has anyone written something like this for Numeric (or numarray, but I'd prefer Numeric at this point) ?

I was surprised not to find something like this in SciPy, maybe I didn't look hard enough.

scipy.io.read_array? I haven't timed it, because it's been 'fast enough' for my needs. For reading binary data files, I have this little utility which is basically a wrapper around Numeric.fromstring (N below is Numeric imported 'as N'). Note that it can read binary .gz files directly, a _huge_ gain for very sparse files representing 3d arrays (I can read a 400k gz file which blows up to ~60MB when unzipped in no time at all, while reading the unzipped file is very slow): def read_bin(fname,dims,typecode,recast_type=None,offset=0,verbose=0): """Read in a binary data file. Does NOT check for endianness issues. Inputs: fname - can be .gz dims (nx1,nx2,...,nxd) typecode recast_type offset=0: # of bytes to skip in file *from the beginning* before data starts """ # config parameters item_size = N.zeros(1,typecode).itemsize() # size in bytes data_size = N.product(N.array(dims))*item_size # read in data if fname.endswith('.gz'): data_file = gzip.open(fname) else: data_file = file(fname) data_file.seek(offset) data = N.fromstring(data_file.read(data_size),typecode) data_file.close() data.shape = dims if verbose: #print 'Read',data_size/item_size,'data points. Shape:',dims print 'Read',N.size(data),'data points. Shape:',dims if recast_type is not None: data = data.astype(recast_type) return data HTH, f

Chris Barker

10:58 a.m.

New subject: How to read data from text files fast?

Thanks to Fernando Perez and Travis Oliphant for pointing me to:

...

scipy.io.read_array

In testing, I've found that it's very slow (for my needs), though quite nifty in other ways, so I'm sure I'll find a use for it in the future. Travis Oliphant wrote:

...

Alternatively, we could move some of the Python code in read_array to C to improve the speed.

That was beyond me, so I wrote a very simple module in C that does what I want, and it is very much faster than read_array or straight python version. It has two functions: FileScan(file) """ Reads all the values in rest of the ascii file, and produces a Numeric vector full of Floats (C doubles). All text in the file that is not part of a floating point number is skipped over. """ FileScanN(file, N) """ Reads N values in the ascii file, and produces a Numeric vector of length N full of Floats (C doubles). Raises an exception if there are fewer than N numbers in the file. All text in the file that is not part of a floating point number is skipped over. After reading N numbers, the file is left before the next non-whitespace character in the file. This will often leave the file at the start of the next line, after scanning a line full of numbers. """ I implemented them separately, 'cause I wasn't sure how to deal with optional arguments in a C function. They could easily have wrapped in a Python function if you wanted one interface. FileScan was much more complex, as I had to deal with all the dynamic memory allocation. I probably took a more complex approach to this than I had to, but it was an exercise for me, being a newbie at C. I also decided not to specify a shape for the resulting array, always returning a rank-1 array, as that made the code easier, and you can always set A.shape afterward. This could be put in a Python wrapper as well. It has the obvious limitation that it only does doubles. I'd like to add longs as well, but probably won't have a need for anything else. The way memory is these days, it seems just as easy to read the long ones, and convert afterward if you want. Here is a quick benchmark (see below) run with a file that is 63,000 lines, with two comma-delimited numbers on each line. Run on a 1GHz P4 under Linux. Reading with read_array it took 16.351712 seconds to read the file with read_array Reading with Standard Python methods it took 2.832078 seconds to read the file with standard Python methods Reading with FileScan it took 0.444431 seconds to read the file with FileScan Reading with FileScanN it took 0.407875 seconds to read the file with FileScanN As you can see, read_array is painfully slow for this kind of thing, straight Python is OK, and FileScan is pretty darn fast. I've enclosed the C code and setup.py, if anyone wants to take a look, and use it, or give suggestions or bug fixes or whatever, that would be great. In particular, I don't think I've structured the code very well, and there could be memory leak, which I have not tested carefully for. Tested only on Linux with Python2.3.3, Numeric 23.1. If someone wants to port it to numarray, that would be great too. -Chris The benchmark: def test6(): """ Testing various IO options """ from scipy.io import array_import filename = "JunkBig.txt" file = open(filename) print "Reading with read_array" start = time.time() A = array_import.read_array(file,",") print "it took %f seconds to read the file with read_array"%(time.time() - start) file.close() file = open(filename) print "Reading with Standard Python methods" start = time.time() A = [] for line in file: A.append( map ( float, line.strip().split(",") ) ) A = array(A) print "it took %f seconds to read the file with standard Python methods"%(time.time() - start) file.close() file = open(filename) print "Reading with FileScan" start = time.time() A = FileScanner.FileScan(file) A.shape = (-1,2) print "it took %f seconds to read the file with FileScan"%(time.time() - start) file.close() file = open(filename) print "Reading with FileScanN" start = time.time() A = FileScanner.FileScanN(file, product(A.shape) ) A.shape = (-1,2) print "it took %f seconds to read the file with FileScanN"%(time.time() - start) -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov #include "Python.h" #include <Numeric/arrayobject.h> // NOTE: these buffer sizes were picked very arbitrarily, and have // remarkably little impact on performance on my system. #define BUFFERSIZE1 1024 #define BUFFERSIZE2 64 int filescan(FILE *infile, int NNums, double *array){ double N; int i, j; int c; for (i=0; i<NNums; i++){ while ( (j = fscanf(infile, "%lg", &N)) == 0 ){ c = fgetc(infile); } if (j == EOF) { return(i); } array[i] = N; } // Go to the end of any whitespace: while ( isspace(c = fgetc(infile)) ){ //printf("skipping a whitespace character: %i\n", c); //printf("I'm at position %i in the file\n",ftell(infile)); } if (c > -1){ // not EOF, rewind the file one byte. fseek(infile, -1, SEEK_CUR); } return(i); } static char doc_FileScanN[] = "FileScanN(file, N)\n\n" "Reads N values in the ascii file, and produces a Numeric vector of\n" "length N full of Floats (C doubles).\n\n" "Raises an exception if there are fewer than N numbers in the file.\n\n" "All text in the file that is not part of a floating point number is\n" "skipped over.\n\n" "After reading N numbers, the file is left before the next non-whitespace\n" "character in the file. This will often leave the file at the start of\n" "the next line, after scanning a line full of numbers.\n" ; static PyObject * FileScanner_FileScanN(PyObject *self, PyObject *args) { PyFileObject *File; PyArrayObject *Array; int length; double *Data; int i; //printf("Starting\n"); if (!PyArg_ParseTuple(args, "O!i", &PyFile_Type, &File, &length) ) { return NULL; } Data = calloc(length, sizeof(double) ); if ((i = filescan(PyFile_AsFile( (PyObject*)File ), length, Data)) < length){ PyErr_SetString (PyExc_ValueError, "End of File reached before all numbers found"); free(Data); return NULL; } Array = (PyArrayObject *) PyArray_FromDims(1, &length, PyArray_DOUBLE); for (i = 0; i< length ; i++){ *(double *)(Array->data + (i * Array->strides[0] ) ) = Data[i]; } free(Data); return PyArray_Return(Array); } static char doc_FileScan[] = "FileScan(file)\n\n" "Reads all the values in rest of the open ascii file: file, and produces\n" "a Numeric vector full of Floats (C doubles).\n\n" "All text in the file that is not part of a floating point number is\n" "skipped over.\n\n" ; static PyObject * FileScanner_FileScan(PyObject *self, PyObject *args) { FILE *infile; char *DataPtr; PyFileObject *File; PyArrayObject *Array; double *(*P_array); double *(*Old_P_array); int i,j,k; int ScanCount = 0; int BufferSize = BUFFERSIZE2; int OldBufferSize = 0; int StartOfBuffer = 0; int NumBuffers = 0; if (!PyArg_ParseTuple(args, "O!", &PyFile_Type, &File) ) { return NULL; } infile = PyFile_AsFile( (PyObject*)File ); P_array = (double**) calloc(BufferSize, sizeof(void*) ); while (1) { for (j=StartOfBuffer; j < BufferSize; j++){ P_array[j] = (double*) calloc(BUFFERSIZE1, sizeof(double)); NumBuffers++ ; i = filescan(infile, BUFFERSIZE1, P_array[j]); if (i) { ScanCount += i; //for (k=0; k<BUFFERSIZE1; k++){ // printf("%.14g\n", P_array[j][k]); //} } if (i == 0){ break; } } if (i == 0) { break; } // Need more memory OldBufferSize = BufferSize; BufferSize += BUFFERSIZE2; StartOfBuffer += BUFFERSIZE2; Old_P_array = P_array; P_array = (double**) calloc(BufferSize, sizeof(void*) ); for (j=0; j < OldBufferSize; j++){ P_array[j] = Old_P_array[j]; } free(Old_P_array); } // copy all the data to a PyArray Array = (PyArrayObject *) PyArray_FromDims(1, &ScanCount, PyArray_DOUBLE); i = 0; DataPtr = Array->data; for (j=0; j<BufferSize; j++){ for (k=0; k<BUFFERSIZE1; k++){ if (i >= ScanCount) { break; } *(double *)DataPtr = P_array[j][k]; DataPtr += Array->strides[0]; i++; } } //free all the memory for (j=0; j<NumBuffers; j++){ free(P_array[j]); } free(P_array); return PyArray_Return(Array); } static PyMethodDef FileScannerMethods[] = { {"FileScanN", FileScanner_FileScanN, METH_VARARGS, doc_FileScanN}, {"FileScan", FileScanner_FileScan, METH_VARARGS, doc_FileScan}, // {"byteswap", NumericExtras_byteswap, METH_VARARGS, doc_byteswap}, //{"changetype", NumericExtras_changetype, METH_VARARGS, doc_changetype}, {NULL, NULL} /* Sentinel */ }; void initFileScanner(void){ (void) Py_InitModule("FileScanner", FileScannerMethods); import_array() } #!/usr/bin/env python2.3 from distutils.core import setup, Extension module1 = Extension('FileScanner', sources = ['FileScan_module.c']) setup (name = 'FileScan', version = '0.9', description = 'This is general purpose module for', ext_modules = [module1])

Fernando.Perez＠colorado.edu

12:25 p.m.

New subject: How to read data from text files fast?

Quoting Chris Barker <Chris.Barker@noaa.gov>:

...

Thanks to Fernando Perez and Travis Oliphant for pointing me to:

...
scipy.io.read_array

In testing, I've found that it's very slow (for my needs), though quite nifty in other ways, so I'm sure I'll find a use for it in the future.

Just a quick note Travis sent to me privately: he suggested using io.numpyio.fread instead of Numeric.fromstring() for speed reasons. I don't know if it will help in your case, I just mention it in case it helps. Cheers, F

Chris Barker

12:41 p.m.

New subject: How to read data from text files fast?

Fernando.Perez@colorado.edu wrote: \> Just a quick note Travis sent to me privately: he suggested using

...

io.numpyio.fread instead of Numeric.fromstring() for speed reasons. I don't know if it will help in your case, I just mention it in case it helps.

Thanks, but those are for binary files, which I have to do sometimes, so I'll keep it in mind. However, my problem at hand is text files, and my solution is working nicely, though I'd love a pair of more experienced eyes on the code.... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

gerard.vermeulen＠grenoble.cnrs.fr

2:25 p.m.

New subject: Follow-up Numarray header PEP

Hi Todd, This is a follow-up on the 'header pep' discussion. The attachment numnum-0.1.tar.gz contains the sources for the extension modules pep and numnum. At least on my systems, both modules behave as described in the 'numarray header PEP' when the extension modules implementing the C-API are not present (a situation not foreseen by the macros import_array() of Numeric and especially numarray). IMO, my solution is 'bona fide', but requires further testing. The pep module shows how to handle the colliding C-APIs of the Numeric and numarray extension modules and how to implement automagical conversion between Numeric and numarray arrays. For a technical reason explained in the README, the hard work of doing the conversion between Numeric and numarray arrays has been delegated to the numnum module. The numnum module is useful when one needs to convert from one array type to the other to use an extension module which only exists for the other type (eg. combining numarray's image processing extensions with pygame's Numeric interface): Python 2.3+ (#1, Jan 7 2004, 09:17:35) [GCC 3.3.1 (SuSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...

...
...
import numnum; import Numeric as np; import numarray as na np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) print type(np1); np1; type(np2); np2 <type 'array'> array([[1, 2], [3, 4]]) <type 'array'> array([[1, 2, 3], [4, 5, 6]],'i') print type(na1); na1; type(na2); na2 <class 'numarray.numarraycore.NumArray'> array([[1, 2], [3, 4]]) <class 'numarray.numarraycore.NumArray'> array([[1, 2, 3], [4, 5, 6]])

The pep module shows how to implement array processing functions which use the Numeric, numarray or Sequence C-API: static PyObject * wysiwyg(PyObject *dummy, PyObject *args) { PyObject *seq1, *seq2; PyObject *result; if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) return NULL; switch(API) { case NumericAPI: { PyObject *np1 = NN_API->toNP(seq1); PyObject *np2 = NN_API->toNP(seq2); result = np_wysiwyg(np1, np2); Py_XDECREF(np1); Py_XDECREF(np2); break; } case NumarrayAPI: { PyObject *na1 = NN_API->toNA(seq1); PyObject *na2 = NN_API->toNA(seq2); result = na_wysiwyg(na1, na2); Py_XDECREF(na1); Py_XDECREF(na2); break; } case SequenceAPI: result = seq_wysiwyg(seq1, seq2); break; default: PyErr_SetString(PyExc_RuntimeError, "Should never happen"); return 0; } return result; } See the README for an example session using the pep module showing that it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg(). Notes: - it is straightforward to adapt pep and numnum so that the conversion functions are linked into pep instead of imported. - numnum is still 'proof of concept'. I am thinking about methods to make those techniques safer if the numarray (and Numeric?) header files make it never into the Python headers (or make it safer to use those techniques with Python < 2.4). In particular it would be helpful if the numerical C-APIs export an API version number, similar to the versioning scheme of shared libraries -- see the libtool->versioning info pages. I am considering three possibilities to release a more polished version of numnum (3rd party extension writers may prefer to link rather than import numnum's functionality): 1. release it from PyQwt's project page 2. register an independent numnum project at SourceForge 3. hand numnum over to the Numerical Python project (frees me from worrying about API changes). Regards -- Gerard Vermeulen

Todd Miller

5:49 a.m.

New subject: Follow-up Numarray header PEP

On Sun, 2004-07-18 at 17:24, gerard.vermeulen@grenoble.cnrs.fr wrote:

...

Hi Todd,

This is a follow-up on the 'header pep' discussion.

Great! I was afraid you were going to disappear back into the ether. Sorry I didn't respond to this yesterday... I saw it but accidentally marked it as "read" and then forgot about it as the day went on.

...

The attachment numnum-0.1.tar.gz contains the sources for the extension modules pep and numnum. At least on my systems, both modules behave as described in the 'numarray header PEP' when the extension modules implementing the C-API are not present (a situation not foreseen by the macros import_array() of Numeric and especially numarray).

For numarray, this was *definitely* foreseen at some point, so I'm wondering what doesn't work now...

...

IMO, my solution is 'bona fide', but requires further testing.

I'll look it over today or tomorrow and comment more then.

...

The pep module shows how to handle the colliding C-APIs of the Numeric and numarray extension modules and how to implement automagical conversion between Numeric and numarray arrays.

Nice; the conversion code sounds like a good addition to me.

...

For a technical reason explained in the README, the hard work of doing the conversion between Numeric and numarray arrays has been delegated to the numnum module. The numnum module is useful when one needs to convert from one array type to the other to use an extension module which only exists for the other type (eg. combining numarray's image processing extensions with pygame's Numeric interface):

Python 2.3+ (#1, Jan 7 2004, 09:17:35) [GCC 3.3.1 (SuSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...
...
...
import numnum; import Numeric as np; import numarray as na np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) print type(np1); np1; type(np2); np2 <type 'array'> array([[1, 2], [3, 4]]) <type 'array'> array([[1, 2, 3], [4, 5, 6]],'i') print type(na1); na1; type(na2); na2 <class 'numarray.numarraycore.NumArray'> array([[1, 2], [3, 4]]) <class 'numarray.numarraycore.NumArray'> array([[1, 2, 3], [4, 5, 6]])

The pep module shows how to implement array processing functions which use the Numeric, numarray or Sequence C-API:

static PyObject * wysiwyg(PyObject *dummy, PyObject *args) { PyObject *seq1, *seq2; PyObject *result;

if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) return NULL;

switch(API) {

We'll definitely need to cover API in the PEP. There is a design choice here which needs to be discussed some and any resulting consensus documented. I haven't looked at the attachment yet.

...

case NumericAPI: { PyObject *np1 = NN_API->toNP(seq1); PyObject *np2 = NN_API->toNP(seq2); result = np_wysiwyg(np1, np2); Py_XDECREF(np1); Py_XDECREF(np2); break; } case NumarrayAPI: { PyObject *na1 = NN_API->toNA(seq1); PyObject *na2 = NN_API->toNA(seq2); result = na_wysiwyg(na1, na2); Py_XDECREF(na1); Py_XDECREF(na2); break; } case SequenceAPI: result = seq_wysiwyg(seq1, seq2); break; default: PyErr_SetString(PyExc_RuntimeError, "Should never happen"); return 0; }

return result; }

See the README for an example session using the pep module showing that it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg().

Notes:

- it is straightforward to adapt pep and numnum so that the conversion functions are linked into pep instead of imported.

- numnum is still 'proof of concept'. I am thinking about methods to make those techniques safer if the numarray (and Numeric?) header files make it never into the Python headers (or make it safer to use those techniques with Python < 2.4). In particular it would be helpful if the numerical C-APIs export an API version number, similar to the versioning scheme of shared libraries -- see the libtool->versioning info pages.

I've thought about this a few times; there's certainly a need for it in numarray anyway... and I'm always one release too late. Thanks for the tip on libtool->versioning.

...

I am considering three possibilities to release a more polished version of numnum (3rd party extension writers may prefer to link rather than import numnum's functionality):

1. release it from PyQwt's project page 2. register an independent numnum project at SourceForge 3. hand numnum over to the Numerical Python project (frees me from worrying about API changes).

Regards -- Gerard Vermeulen

(3) sounds best to me, for the same reason that numarray is a part of the numpy project and because numnum is a Numeric/numarray tool. There is a small issue of sub-project organization (seperate bug tracking, etc.), but I figure if SF can handle Python, it can handle Numeric, numarray, and probably a number of other packages as well. Something like numnum should not be a problem and so to promote it, it would be good to keep it where people can find it without having to look too hard. For now, I'm again marking your post as "unread" and will revisit it later this week. In the meantime, thanks very much for your efforts with numnum and the PEP. Regards, Todd

Todd Miller

10:30 a.m.

New subject: Follow-up Numarray header PEP

Hi Gerard, I finally got to your numnum stuff today... awesome work! You've got lots of good suggestions. Here are some comments: 1. Thanks for catching the early return problem with numarray's import_array(). It's not just bad, it's wrong. It'll be fixed for 1.1. 2. That said, I think expanding the macros in-line in numnum is a mistake. It seems to me that "import_array(); PyErr_Clear();" or something like it ought to be enough... after numarray-1.1 anyway. 3. I think there's a problem in numnum.toNP() because of numarray's array "behavior" issues. A test needs to be done to ensure that the incoming array is not byteswapped or misaligned; if it is, the easy fix is to make a numarray copy of the array before copying it to Numeric. 4. Kudos for the LP64 stuff. numconfig is a thorn in the side of the PEP, so I'll put your techniques into numarray for 1.1. HAS_FLOAT128 is not currently used, so it might be time to ditch it. Anyway, thanks! 5. PyArray_Present() and isArray() are superfluous *now*. I was planning to add them to Numeric. 6. The LGPL may be a problem for us and is probably an issue if we ever try to get numnum into the Python distribution. It would be better to release numnum under the modified BSD license, same as numarray. 7. Your API struct was very clean. Eventually I'll regenerate numarray like that. 8. I logged your comments and bug reports on Source Forge and eventually they'll get fixed. A to Z the numnum/pep code is beautiful. Next stop, header PEP update. Regards, Todd On Sun, 2004-07-18 at 17:24, gerard.vermeulen@grenoble.cnrs.fr wrote:

...

Hi Todd,

This is a follow-up on the 'header pep' discussion.

The attachment numnum-0.1.tar.gz contains the sources for the extension modules pep and numnum. At least on my systems, both modules behave as described in the 'numarray header PEP' when the extension modules implementing the C-API are not present (a situation not foreseen by the macros import_array() of Numeric and especially numarray). IMO, my solution is 'bona fide', but requires further testing.

The pep module shows how to handle the colliding C-APIs of the Numeric and numarray extension modules and how to implement automagical conversion between Numeric and numarray arrays.

For a technical reason explained in the README, the hard work of doing the conversion between Numeric and numarray arrays has been delegated to the numnum module. The numnum module is useful when one needs to convert from one array type to the other to use an extension module which only exists for the other type (eg. combining numarray's image processing extensions with pygame's Numeric interface):

Python 2.3+ (#1, Jan 7 2004, 09:17:35) [GCC 3.3.1 (SuSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...
...
...
import numnum; import Numeric as np; import numarray as na np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) print type(np1); np1; type(np2); np2 <type 'array'> array([[1, 2], [3, 4]]) <type 'array'> array([[1, 2, 3], [4, 5, 6]],'i') print type(na1); na1; type(na2); na2 <class 'numarray.numarraycore.NumArray'> array([[1, 2], [3, 4]]) <class 'numarray.numarraycore.NumArray'> array([[1, 2, 3], [4, 5, 6]])

The pep module shows how to implement array processing functions which use the Numeric, numarray or Sequence C-API:

static PyObject * wysiwyg(PyObject *dummy, PyObject *args) { PyObject *seq1, *seq2; PyObject *result;

if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) return NULL;

switch(API) { case NumericAPI: { PyObject *np1 = NN_API->toNP(seq1); PyObject *np2 = NN_API->toNP(seq2); result = np_wysiwyg(np1, np2); Py_XDECREF(np1); Py_XDECREF(np2); break; } case NumarrayAPI: { PyObject *na1 = NN_API->toNA(seq1); PyObject *na2 = NN_API->toNA(seq2); result = na_wysiwyg(na1, na2); Py_XDECREF(na1); Py_XDECREF(na2); break; } case SequenceAPI: result = seq_wysiwyg(seq1, seq2); break; default: PyErr_SetString(PyExc_RuntimeError, "Should never happen"); return 0; }

return result; }

See the README for an example session using the pep module showing that it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg().

Notes:

- it is straightforward to adapt pep and numnum so that the conversion functions are linked into pep instead of imported.

- numnum is still 'proof of concept'. I am thinking about methods to make those techniques safer if the numarray (and Numeric?) header files make it never into the Python headers (or make it safer to use those techniques with Python < 2.4). In particular it would be helpful if the numerical C-APIs export an API version number, similar to the versioning scheme of shared libraries -- see the libtool->versioning info pages.

I am considering three possibilities to release a more polished version of numnum (3rd party extension writers may prefer to link rather than import numnum's functionality):

1. release it from PyQwt's project page 2. register an independent numnum project at SourceForge 3. hand numnum over to the Numerical Python project (frees me from worrying about API changes).

Regards -- Gerard Vermeulen

gerard.vermeulen＠grenoble.cnrs.fr

10:49 p.m.

New subject: Follow-up Numarray header PEP

Hi Todd, Attached is a new version of numnum (including 'topbot', an alternative implementation of numnum). The README contains some additional comments with respect to numarray and Numeric (new comments are preceeded by '+', old comments by '-'). There were still some other bugs in numnum, too. On 23 Jul 2004 13:28:47 -0400, Todd Miller wrote

...

I finally got to your numnum stuff today... awesome work! You've got lots of good suggestions. Here are some comments:

1. Thanks for catching the early return problem with numarray's import_array(). It's not just bad, it's wrong. It'll be fixed for 1.1.

2. That said, I think expanding the macros in-line in numnum is a mistake. It seems to me that "import_array(); PyErr_Clear();" or something like it ought to be enough... after numarray-1.1 anyway.

Indeed, but I am spoiled by C++ and was falling back on gcc -E for debugging.

...

3. I think there's a problem in numnum.toNP() because of numarray's array "behavior" issues. A test needs to be done to ensure that the incoming array is not byteswapped or misaligned; if it is, the easy fix is to make a numarray copy of the array before copying it to Numeric.

Done, but what would be the best function to do this? And the documentation could insist a little more on the possibility of ill-behaved arrays (see README).

...

4. Kudos for the LP64 stuff. numconfig is a thorn in the side of the PEP, so I'll put your techniques into numarray for 1.1. HAS_FLOAT128 is not currently used, so it might be time to ditch it. Anyway, thanks!

There is a difference between the PEP header files and internal numarray usage. I find in my CVS working copy: [packer@slow numarray]$ grep HAS_FLOAT */* Src/_ndarraymodule.c:#if HAS_FLOAT128 and [packer@slow numarray]$ grep HAS_UINT64 */* Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 but that is not be true for the header files (more important for the PEP) [packer@slow Include]$ grep HAS_UINT64 */* [packer@slow Include]$ grep HAS_FLOAT128 */* numarray/arraybase.h:#if HAS_FLOAT128

...

5. PyArray_Present() and isArray() are superfluous *now*. I was planning to add them to Numeric.

6. The LGPL may be a problem for us and is probably an issue if we ever try to get numnum into the Python distribution. It would be better to release numnum under the modified BSD license, same as numarray.

Done, with certain regrets because I believe in (L)GPL. The minutes of the last board meeting of the PSF tipped the scale ( http://www.python.org/psf/records/board/minutes-2004-06-18.html ) What remains to be done is showing how to add numnum's functionality to a 3rd party extension by linking numnum's object files to the extension instead of importing numnum's C-API (numnum should not become another dependency) Gerard

...

7. Your API struct was very clean. Eventually I'll regenerate numarray like that.

8. I logged your comments and bug reports on Source Forge and eventually they'll get fixed.

A to Z the numnum/pep code is beautiful. Next stop, header PEP update.

Regards, Todd

On Sun, 2004-07-18 at 17:24, gerard.vermeulen@grenoble.cnrs.fr wrote:

...
Hi Todd,

This is a follow-up on the 'header pep' discussion.

The attachment numnum-0.1.tar.gz contains the sources for the extension modules pep and numnum. At least on my systems, both modules behave as described in the 'numarray header PEP' when the extension modules implementing the C-API are not present (a situation not foreseen by the macros import_array() of Numeric and especially numarray). IMO, my solution is 'bona fide', but requires further testing.

The pep module shows how to handle the colliding C-APIs of the Numeric and numarray extension modules and how to implement automagical conversion between Numeric and numarray arrays.

For a technical reason explained in the README, the hard work of doing the conversion between Numeric and numarray arrays has been delegated to the numnum module. The numnum module is useful when one needs to convert from one array type to the other to use an extension module which only exists for the other type (eg. combining numarray's image processing extensions with pygame's Numeric interface):

Python 2.3+ (#1, Jan 7 2004, 09:17:35) [GCC 3.3.1 (SuSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...
...
...
import numnum; import Numeric as np; import numarray as na np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) print type(np1); np1; type(np2); np2 <type 'array'> array([[1, 2], [3, 4]]) <type 'array'> array([[1, 2, 3], [4, 5, 6]],'i') print type(na1); na1; type(na2); na2 <class 'numarray.numarraycore.NumArray'> array([[1, 2], [3, 4]]) <class 'numarray.numarraycore.NumArray'> array([[1, 2, 3], [4, 5, 6]])

The pep module shows how to implement array processing functions which use the Numeric, numarray or Sequence C-API:

static PyObject * wysiwyg(PyObject *dummy, PyObject *args) { PyObject *seq1, *seq2; PyObject *result;

if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) return NULL;

switch(API) { case NumericAPI: { PyObject *np1 = NN_API->toNP(seq1); PyObject *np2 = NN_API->toNP(seq2); result = np_wysiwyg(np1, np2); Py_XDECREF(np1); Py_XDECREF(np2); break; } case NumarrayAPI: { PyObject *na1 = NN_API->toNA(seq1); PyObject *na2 = NN_API->toNA(seq2); result = na_wysiwyg(na1, na2); Py_XDECREF(na1); Py_XDECREF(na2); break; } case SequenceAPI: result = seq_wysiwyg(seq1, seq2); break; default: PyErr_SetString(PyExc_RuntimeError, "Should never happen"); return 0; }

return result; }

See the README for an example session using the pep module showing that it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg().

Notes:

- it is straightforward to adapt pep and numnum so that the conversion functions are linked into pep instead of imported.

- numnum is still 'proof of concept'. I am thinking about methods to make those techniques safer if the numarray (and Numeric?) header files make it never into the Python headers (or make it safer to use those techniques with Python < 2.4). In particular it would be helpful if the numerical C-APIs export an API version number, similar to the versioning scheme of shared libraries -- see the libtool->versioning info pages.

I am considering three possibilities to release a more polished version of numnum (3rd party extension writers may prefer to link rather than import numnum's functionality):

1. release it from PyQwt's project page 2. register an independent numnum project at SourceForge 3. hand numnum over to the Numerical Python project (frees me from worrying about API changes).

Regards -- Gerard Vermeulen

--

-- Open WebMail Project (http://openwebmail.org)

7496

Age (days ago)

7523

Last active (days ago)

List overview

Download

22 comments

9 participants

participants (9)

Chris Barker
Colin J. Williams
Fernando Perez
Fernando.Perez＠colorado.edu
Gerard Vermeulen
gerard.vermeulen＠grenoble.cnrs.fr
Perry Greenfield
Sebastian Haase
Todd Miller

Numarray header PEP

Gerard Vermeulen

gerard.vermeulen＠grenoble.cnrs.fr

gerard.vermeulen＠grenoble.cnrs.fr

Sebastian Haase

gerard.vermeulen＠grenoble.cnrs.fr

Gerard Vermeulen

Fernando Perez

Fernando.Perez＠colorado.edu

gerard.vermeulen＠grenoble.cnrs.fr

gerard.vermeulen＠grenoble.cnrs.fr

tags

participants (9)