[C++-sig] extractors

Mon Jul 15 09:52:59 CEST 2002

Having just got finished with some massive code massage to the Python<->C++
converter interface, I am finally ready to embark on implementing the
extractor interface which can be used to extract C++ types from Python
objects.

We have discussed two main use cases for extractions

1. In user code which needs to retrieve C++ objects from their
corresponding Python objects:

  a.
    void f(object x)
    {
        int y = extract<int>(x); // retrieve an int from x
    }

  b. Users may also want to explicitly check for convertibility:

    int g(object x)
    {
        extractor<int> get_int(x);
        if (get_int)
            return get_int();
        else
            return 0;
    }

2. In the implementation of sequence from_python converters (e.g. Python
tuple/list -> std::vector), as in Ralf's example at
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/cctbx/cctbx/include/cctbx/bp
l_utils.h?rev=1.2.2.14&content-type=text/vnd.viewcvs-markup. These are
still a messy phenomenon, and I'm not sure that it makes sense to lump them
into the same facility. First we need to review the requirements. As I
understand it:

    a. We decided it would be acceptable to support only overload
resolution on "sequence-ness" and not on the types of sequence elements. In
other words, if I wrap two functions:

    void f(std::vector<int>);
    void f(std::list<char*>);

 and I register the Python iterable -> C++ sequence converters for both
argument types, I can write:

    >>> f([1, 2, 3])

 but it's an equally good match for both functions. Of course, if I wrap
std::vector<int> as a Python class int_vec, then:

    >>> f(int_vec(1, 2, 3))

matches the first overload exactly.

    b. That means we can detect iterables via the presence of an __iter__
or a __getitem__ attribute... and we don't have to follow the practice used
in Ralf's example of traversing the sequence twice in order to detect the
overload. As far as I can tell after raising the issue on Python-dev
(http://mail.python.org/pipermail/python-dev/2002-July/026200.html), this
is the more-natural fit to Guido's idea of Python iterables as a
single-pass concept.

    c. It also looks to me like any other approach would destroy the
information in a single-pass iterable (e.g. a file) while deciding to
reject an overload -- we'd have to touch all the elements but the next
argument might not match. That seems unacceptable to me.

    d. Ultimately, that means that if the overload succeds, but the
iterable contains the wrong kind of elements, the selected function itself
will appear (from the Python side) to throw an exception immediately. That
seems acceptable to me.

To fulfill the requirements above, we only need to supply the form used in
1a.

However, return value converters present a minor problem. Recall that when
returning a pointer or reference type from a Python function, e.g.:

    call<Foo*>(py_func, arg1, arg2...)

The Foo object must be contained within the Python result object. Since the
Python result object will have its reference count decremented after
py_func returns, we must throw an exception instead of returning if the
result object has a reference count <= 1, to keep from returning a dangling
pointer.

Now consider:

    call<std::vector<Foo*> >(py_func, arg1, arg2...)
         ^^^^^^^^^^^^^^^^^

If py_func returns an list of wrapped Foo objects, we have a similar
problem. However, now it goes one level deeper: if the list has only one
reference and any element has only one reference, we should throw an
exception. Worse, you can't really tell whether the pointer will dangle; if
the returned iterable is really a list iterator, the list itself may well
have enough references to keep it alive through the return process.

Because return_from_python (c.f. above) and the arg_from_python (for
unwrapping C++ wrapped function arguments) converters use the same
registered conversion mechanism, they are subject to the same constraints
at the inner level. We have two choices:

1. Provide no protection against dangling pointer sequence elements
2. Prohibit this mechanism from converting sequences of raw pointers (and
references, if such a thing is possible)

Taking Dave Hawkes' excellent advice to not give away the store, I'm
inclined to do #2 first and see how it plays out. Note that means our
iterable converter would have to use a different mechanism than the
user-level extract<> call after all, since I don't want to prohibit people
from writing

    Foo* p = extract<Foo*>(o);

Thoughts?
Dave

+---------------------------------------------------------------+
                  David Abrahams
      C++ Booster (http://www.boost.org)               O__  ==
      Pythonista (http://www.python.org)              c/ /'_ ==
  resume: http://users.rcn.com/abrahams/resume.html  (*) \(*) ==
          email: david.abrahams at rcn.com
+---------------------------------------------------------------+