[C++-sig] Weave Scipy Inline C++

Mon Sep 16 13:57:08 CEST 2002

> > >   object f(boost::python::object x)
> > >   {
> > >     x[1] = "hello";
> > >     return x(1,2,3,4,5,6,x); // __call__
> > >     ...
> >
> > This is definitely visually cleaner, and I like it better.  Maybe a
few
> > overloads in SCXX would make
> >
> > x[1] = "hello";
> >
> > work though for ints, floats, strings, etc.  I'll look at this and
> > report back...  Yep, worked fine.
> >
> > Also, I understand the first line, but not what the second is doing.
Is
> > x like a UserList that has a __call__ method defined?
> 
> Hypothetically, yes.
> 
> > Note also that boost has more work to do in this case than weave
does.
> > Boost::python::object can be pretty much anything I'm guessing.
When we
> > get to the 'x[1] = "hello";' line in weave, the code is explicitly
> > compiled again each time for a list or a tuple or a dict.
> 
> That sounds like a lot more work than what Boost.Python is doing!

But the work is at the Python level, not at the C++ level.  Python
determines the object types, generates the correct C++ code.  This means
that the C++ code can use simple constructs.

> 
> > The following
> > happens at the command line:
> >
> > >>> a = [1]
> > >>> weave.inline('a[0] = "hello";',['a'])
> > <compile a function that takes 'a' as a list>
> > >>> a[0]
> > "hello"
> > >>> a = {}
> > >>> weave.inline('a[0] = "hello";',['a'])
> > <compile a function that takes 'a' as a dict>
> >
> > So I'm guessing the cleverness you probably had to go through to get
> > things working in C++ is handled by the dynamic typing mechanism in
> > weave.
> 
> Wow, that could result in a huge amount of bloat.

The behavior is much like templates in C++ (vector<int> and
vector<float>) in C++ code. Right?  

Also, how would you propose handling dynamic typing for code that is
generated on the fly?  Everything is done at run time in weave.  Nothing
is compiled until the inline() call is made for the first time with a
given argument type.  

Also, if the code is the following:

a = zeros(1000000),Float64)
<fill in a with some values>
code = """ 
       double sum=0;
       For(int i=0;i<N;i++)
           sum += a[i];
       return_val = PyFloat_FromDouble(sum);
       """
N = shape(a)
sum = weave.inline(code,['a','N'])

I want the code generated to convert 'a' to a double* so the compiler
can generate optimized code for this case.  instead of trying using
polymorphism.

In practice, I've never noticed a problem with weave's approach to
dynamic typing.  If inline is used *a lot*, your program will certainly
grow, and on small machines, this could be a problem.  But, the dll/so
files are generally quite small (20-30 KB) for code that uses standard
types, so the biggest problem on memory limited machines is probably the
compilation process.

> What if you /want/ some runtime polymorphism in your C++ code?

As far as I know, nothing in weave precludes using polymorphism in
places you want to.  I'm not sure what the benefit is though on the data
conversion step, as you already know the data type of the object coming
in.  You might as well generate code explicitly for that type.

> Furthermore, without special knowledge of the internals of every type
that
> might be passed, you can't generate code that's any more-optimized for
T1
> than for T2:
> 
>     >>> class T1(dict): pass
>     >>> class T2(dict): pass
>     >>> a = T1()
>     >>> weave.inline('a[0] = "hello";', ['a'])
>     >>> a = T2()
>     >>> weave.inline('a[0] = "hello";', ['a'])

weave converts instances it doesn't know about to PyObject*.  So, to
work with these instances, you'd have to use the Python C API.  The
above code would not work.  And, I think it would take quite a bit more
than just using boost to get this code to work.  It is possible to
inspect Python objects on the fly and try and generate classes for them,
but it will be very tricky to get it right.  And, since Python objects
can change on the fly, correct and fast are probably mutually exclusive.
Every method call and attribute access would have to call back into
Python.  On the other hand, if the class is defined in C++, wrapped with
boost or SWIG, and weave is told about it, the code can be very fast.

> 
> We can generate any number of distinct types for which x[0] = "hello"
is a
> valid expression...
> 
> > >
> > > not
> > >
> > >     x[Py::Int(1)] = Py::Str("hello");
> > >     // ??? what does __call__ look like?
> >
> > Currently I just use the Python API for calling functions --
although
> > SCXX does have a callable class that could be used.  Also, nothing
> > special is done to convert instances that flow from Python into C++
> > unless a special converter has been written for them (such as for
> > wxPython objects).  Things weave doesn't explicitly know about are
left
> > as PyObject* which can be manipulated in C code.
> 
> Boost.Python is designed with the idea in mind that users never touch
a
> PyObject*.

Much of weave is too, but in places where it isn't able to figure out
types, instead of keeling over, it exposes the raw PyObject and lets the
user deal with it.  This seems a reasonable approach...

> 
> > > or whatever. Getting this code to work everywhere was one of the
> > harder
> > > porting jobs I've ever faced. Compilers seem to have lots of bugs
in
> > the
> > > areas I was exercising.
> >
> > The porting comment scares me.  Is it still ticklish?
> 
> Not too ticklish with modern compilers (though one recent release had
a
> codegen bug this stimulated). The big problem is that there are lots
of
> old
> compilers out there that people still use. VC6, for example.
> 
> > C++ bugs pop in
> > areas where they shouldn't -- even in the same compiler but on
separate
> > platform.  There is currently some very silly code in weave
explicitly
> > to work around exception handling bugs on Mandrake with gcc.  Since
> > spending a couple of days on this single issue, I've tried to avoid
> > anything potentially fragile (hence the move to SCXX).  CXX compile
> > issues also pushed me that direction.  The compilers I really care
about
> > are gcc 2.95.x (mingw and linux), gcc 3.x, MSVC, MIPSPro, SunCC,
DEC,
> > and xlc.  How is boost doing on this set?
> 
> Fine on gcc 2.95.x, 3.1, 3.2, msv6/7, MIPSPro, Dec CXX 6.5 and a whole
> bunch of others.
> I haven't tested xlc recently. I anticipate some issues with SunCC.
> 
> > Weave isn't tested all these
> > places yet, but needs to run on all of them eventually (and
shouldn't
> > have a problem now).
> 
> No conforming code "should have a problem". But you know how in theory
> there's no difference between theory and practice...

Right.  But limiting ourselves in the amount of fancy stuff we use
mitigates the problem.  The C++ tools that weave still uses in its basic
configuration are exceptions, std::string, and std::complex.  The rest
is dirt simple C++ code.  So if the listed items give a compiler
problems, weave probably won't work.  I think these are fairly minimal
requirements.

> 
> > > However, you may still be right that it's not an appropriate
solution
> > for
> > > weave.
> >
> > I think boost would work fine -- maybe even better.  I really like
that
> > boost project is active -- SCXX and CXX aren't very.  The beauty of
SCXX
> > is it takes about 20 minutes to understand its entire code base.
The
> > worries I have with boost are:
> >
> > 1) How much of boost do I have to carry around for the simple
> > functionality I mentioned.
> 
> Boost.Python depends on quite a few of the other boost libraries:
> 
>     type_traits
>     bind
>     function
>     mpl - currently in prerelease
>     smart_ptr
> 
> possibly a few others. These are all in header files.
> 
> > 2) How persnickety is compiling the code on new platforms?
> 
> Fairly persnickety, unless you're a C++ generic/metaprogramming
expert.
> 
> > 3) Would people have to understand much of boost to use this
> > functionality?
> 
> They wouldn't have to understand much of Boost as a whole. They'd only
> need
> to understand the components of Boost.Python that they're using. There
is
> also a template called extract<> which would be useful to know about.
> 
> > 4) How ugly are the error reports from the compiler when code is
> > malformed?  Blitz++ reports are incomprehensible to anyone except
> > template gurus.
> 
> You get a long instantiation backtrace as usual. However, we've
applied
> some tricks which cause error messages to contain a plain english
> description of what the user did wrong in many cases.
> 
> > 5) How steep is my learning curve on the code? (I know, only I can
> > answer that by looking at it for a while which I haven't yet.)
> 
> I have no idea how to answer that.
> 
> > Note that I'm really looking for the prettiest and fastest solution
> > *with the least possible headaches*.  For weave, least headaches
trumps
> > pretty and fast in a major way.
> 
> Then use the Python "C" API.

Did you say that while throughing your hands in the air? :-)  Seriously,
boost is very powerful but also fairly hefty.  It looks to me like I
would need at least 30K lines of code (probably more) to include all of
boost::python and its dependencies.  Probably only a subset of it is
needed for weave, but I have no idea what that subset is. Also, the list
above is not for the faint of heart.  

On the other hand SCXX is extremely light weight and seems to fit the
bill?  It actually reduces headaches compared to the C API, because like
boost, it handles the ref-count issues and makes the code prettier. 

> 
> > I've even considered moving weave back
> > to generating pure C code to make sure it works everywhere and
leaving
> > the user to wrestle with refcounts and the Python API.  I think C++
is
> > getting far enough along though that this shouldn't be necessary
(and
> > allows the "pretty").  Note though, that I was extremely
disappointed
> > with CXX's speed when manipulating lists, etc.  It was *much* slower
> > than calling the raw Python API.  For computationally intense stuff
on
> > lists, etc., you had to revert to API calls.  I haven't benchmarked
SCXX
> > yet, but I'm betting the story is the same.  Most things I care
about
> > are in Numeric arrays, but that isn't true for everyone else.
> 
> Boost.Python's object wrappers are not generally designed for maximum
> speed. For example, the list class knows that it might hold a class
> derived
> from list, so it does PyList_CheckExact() before calling any PyList_
> functions directly. If it's been subclassed it goes through the
general
> Python API. Also, none of the operators such as [] have been set up to
use
> the PyList_xxx functions in this way. It could be done; it's just a
lot of
> work.

OK.

> 
> > One other thought is that once we understand each others
technologies
> > better, we may see other places where synergy is beneficial.
> 
> Hopefully, yes.

I still feel there is a disconnect in the discussion.  I'm working to
get blitz installed on my system in hopes of getting more versed in how
it might mesh with and improve weave.  

> 
> > >
> > > > If you need the other 99.7% of boost's capabilities, then you
> > probably
> > > > need to be using boost instead of weave anyhow.  They serve
> > different
> > > > purposes.  Weave is generally suited for light weight wrapping
and
> > > > speeding up computational kernels with minimum hassle --
especially
> > in
> > > > numeric codes where Numeric isn't fast enough.
> > > >
> > > > Oh, and I'm happy to except patches that allow for boost type
> > converters
> > ^^^^^^ err... accept :-|
> >
> > > > in weave (they should, after all, be easy to write).  Then you
can
> > use
> > > > boost instead of SCXX.
> > >
> > > What did you have in mind?
> >
> > The code for a new type converter class that handles translating
Python
> > code to C++ is rather trivial after the latest factoring of weave.
Here
> > is an example of a weave expression and the underlying C++ code that
is
> > generated on the fly:
> >
> > #python
> > >>> a = {}
> > >>> weave.inline('a["hello"] = 1;',['a'])
> >
> > # underlying ext func
> > static PyObject* compiled_func(PyObject*self, PyObject* args)
> > {
> >     PyObject *return_val = NULL;
> >     int exception_occured = 0;
> >     PyObject *py__locals = NULL;
> >     PyObject *py__globals = NULL;
> >     PyObject *py_a;
> >     py_a = NULL;
> >
> >
> >
if(!PyArg_ParseTuple(args,"OO:compiled_func",&py__locals,&py__globals))
> >         return NULL;
> 
> Isn't there a way to check for dict/int args right
here?^^^^^^^^^^^^^^^^

You can, but it made the code generation hairier.  It was much easier to
have the converter objects inject the code in the code template.  It is
also in preparation for an alternative method of accessing the
variables.  In the future, we're just going to access the variables in
the calling stack frame directly.  This approach coupled with some ugly
things that Pat is doing now will reduce the overhead of calling
inline() substantially.  Right now, you pay a little for getting the
stack frame, the locals, and globals at the Python level as well as an
extra function call.  The newer approach makes inline() calls equivalent
to the cost of about 16 integer adds, which makes inlining even fairly
small snippets beneficial.

> 
> >     try
> >     {
> >         PyObject* raw_locals = py_to_raw_dict(py__locals,"_locals");
> >         PyObject* raw_globals =
py_to_raw_dict(py__globals,"_globals");
> >         /* argument conversion code */
> >         py_a = get_variable("a",raw_locals,raw_globals);
> >         PWODict a = convert_to_dict(py_a,"a");
> >         /* inline code */
> >         a["hello"] = 1;
> >     }
> >     catch(...)
> >     {
> >         return_val =  NULL;
> >         exception_occured = 1;
> >     }
> >     /* cleanup code */
> >     if(!return_val && !exception_occured)
> >     {
> >         Py_INCREF(Py_None);
> >         return_val = Py_None;
> >     }
> >     return return_val;
> > }
> >
> > So the line that has to change is:
> >
> >         PWODict a = convert_to_dict(py_a,"a");
> >
> > and the convert_to_dict function -- but it is automatically
generated by
> > the converter class (although you could customize it if needed).
> 
> That would look something like this in Boost.Python:
> 
>     handle<> py_a = borrowed(get_variable("a", raw_locals,
raw_globals));
>     dict x = extract<dict>(object(py_a));
> 
> You could generate slightly more-efficient code using some of the
> implementation details of Boost.Python, but I'd rather not expose
those to
> anyone.

So that part is simple.  It looks like getting boost converters into
weave is really a matter of getting the support code (headers,
libraries, etc.) straightened out so that distutils links the code
correctly.

eric