[C++-sig] Need strategic advice designing Boost/Python program

Mon Apr 30 23:28:42 CEST 2007

> Also, thank you for your explanation of <extract>.  It seems like I

> will still need to use extract, because I want the Python interface to
> use arrays in a Python-like manner.  On Ralf's suggestion, I will look
> at  scitbx to see how he handles it.

It all centers on the idea of storing repetetive data in C++ reference-counted

arrays, which are wrapped with a central facility (huge template, flex_wrapper.h>).

Custom data are stored in specific C++ classes which are wrapped with Boost.Python.

A simple example is scitbx/histogram.h, where the repetetive data are in

an array of floating point values, and the custom data in the histogram class.

The interesting point after a few years of working experience is:

Most people are covered with the existing array types

(element types double, size_t, std::complex<double>, etc.). It is fairly rare that

someone needs other array types. I'm also trying to keep the number of wrapped array

types low because the compile-time overhead and resulting .so file sizes

are quite significant. This consideration leads to designs where we have

two independent arrays of type A and B, instead of one array of a type C

which composes A and B. That's often a compromise, but, again working

experience, most of the time it really doesn't matter all that much, and you

can hide it, e.g. behind a thin Python layer. If it becomes a problem, you

can still use that huge template, and in a couple of lines define a new array type

with element type C, with a complete Python interface modeled

after Python's builtin list.

So when I approach a new problem I typically try to fit it into the system

by designing a C++ class for the core calculation, reusing our existing

Python-exposed C++ arrays (flex arrays) as inputs. If this solution is too raw

for general consumption, I stick a Python layer on top that makes it look truly

pythonic.

It is very similar to the idea behind Numeric/Numpy/numarray, only that the

scitbx array types are C++ arrays with a "nice" std::vector like interface and

fully automatic life-time management, and that you can easily make arrays of

user-defined types if you feel it is necessary.

Not everything fits into this scheme of using C++ as the "vector unit" and Python

as the slower but more versatile "scalar unit", but we got a long way with this

approach. The only time it didn't work out for me was when I had to handle a

very deeply nested hierarchical data structure (six levels!), with two-way

parent-child references. But numerical data usually come as arrays anyway,

so often it is a no-brainer to design the C++ class processing the data.

> Also--Ralf, thanks for the information about scitbx, and I may have
> more questions once I've studied it.  I'll take a look at it and try to
> learn from its design, but currently the server 
> http://cci.lbl.gov/ seems to be unreachable.

Sorry, that was an accident. It is back online now. (The unusual instability

is because we just moved to new hardware.)

Ralf

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cplusplus-sig/attachments/20070430/8382cd6a/attachment.htm>