[C++-sig] profiling C++ extensions

Sun Mar 3 05:39:48 CET 2002

----- Original Message -----
From: "greg Landrum" <greglandrum at mindspring.com>

> At 04:38 PM 3/2/2002, David Abrahams wrote:
>
> >I don't have a reason to be pessimistic, if that helps, but Ralf
> >Kunstleve is the Boost.Python pickle guy, and he knows a bit more
than
> >I. Also I think he's planning to address pickling for Boost.Python V2
in
> >the near future.
> >
> >I wonder what your CXX pickling interface looks like as compared to
your
> >Boost interface? How many python functions get called with the Boost
> >version?
>
> I haven't looked into this (at the moment I don't have pickling set up
with
> the CXX wrapper, so I can't compare timings).
>
> I guess maybe I wasn't super clear in my last message because of my
focus
> on pickling/depickling.  The timing information I gave (.9 seconds
with
> CXX, 6.7 with Boost) was for simple object instantiation, not
> pickling/depickling.

Oh. In that case, the structure of these instances may be quite
significant. If you look at
http://www.boost.org/libs/python/doc/data_structures.txt you can see
that each Boost.Python extension instance contains both a Python
dictionary and a vector with at least one element, which element points
to an additional dynamically-allocated object that embeds your exported
C++ object. That's four dynamic allocations per object. If your CXX
object does not have an attribute dictionary, it can take as few as one
dynamic allocation per object. Since you're instantiating 1e5 of these,
I imagine it could make a difference that the same memory pool is being
used over and over (as opposed to the Boost.Python case, where some of
the allocations are happening with operator new and some with Python's
allocator). As likely as this seems, this sort of speculation is always
dangerous. Profiling is always better.

> >I guess actual profile info would be more helpful. Have you tried
> >Intel's profiling tools?
>
>http://www.intel.com/software/products/vtune/index.htm?iid=ipp_home+sof
t
> >_vtune&
>
> That's a good idea.  I'll see what I can find.  I'll also spend a bit
more
> time and see if I can figure out the huge timing differences in the
CXX and
> boost object instantiations.

If you find out that it is in fact an allocation issue, you might be
interested in using Boost.Python v2 instead. For classes exposed in the
normal way, it already takes one fewer allocation. Furthermore, it can
work with "traditional" extension types of the kind generated by CXX, so
you could easily avoid the extra allocations by creating an ordinary
extension type. I intend at some point to make it possible to expose C++
classes as fully-subclassable new-style classes but which use only a
single dynamic allocation (and consequently have no dictionary), and
would be happy to work with you if you want to attack that problem
earlier.

-Dave