[C++-sig] Implementation of proper overload resolution
Dane Springmeyer
blake at hailmail.net
Thu Dec 17 20:11:50 CET 2009
Troy,
Really __impressive__ write-up and work on this. I'd definitely be
interested in build instructions and your work getting into future
releases.
Cheers,
Dane
On Dec 17, 2009, at 8:08 AM, Troy D. Straszheim wrote:
>
> Here's what I've got on overloading. This turned out to be a lot more
> work, and this mail much longer, than I'd hoped. Would appreciate a
> readthrough and comments/questions, I've done about all I can do.
> First
> a review of the problems, then a walkthrough of an implementation I
> have
> that fixes them.
>
> 1.a) Problem: Ambiguous calls from python to overloaded functions
>
> Boost.python's 'overloading' is and has always been broken w.r.t. is
> ambiguity. The library currently has no notion, given some tuple of
> arguments, of one function being a better match than another. It
> knows
> only "this function matches" or not. What the user observes is the
> following: when a call to a c++ function is made from python e.g.
>
>>>> m.f(1)
>
> boost.python looks through the overloads for f, in the reverse order
> of
> registration, and calls the first one that is callable. For
> instance if
> the registrations are
>
> void f_int(int);
> void f_bool(bool);
>
> BOOST_PYTHON_MODULE(m)
> {
> def("f", f_int);
> def("f", f_bool);
> }
>
> then f_bool will execute this call of f(1), since the python 'int'
> converts to c++ bool. If the overloads were registered in the reverse
> order, f_int would be executed.
>
> In this case the c++ overloads in question are potentially
> distinguishable from one another, as python has distinct types bool
> and
> int. So these aren't "necessarily" ambiguous.
>
> 1.b) Problem: "Necessarily" ambiguous overloads
>
> There are some overload sets that *are* necessarily ambiguous. This
> set, for instance,
>
> void f_double(double);
> void f_float(float);
>
> BOOST_PYTHON_MODULE(m)
> {
> def("f", f_double);
> def("f", f_float);
> }
>
> can never be unambiguous, since python has no 'float' type. I.e.
>
>>>> f(x)
>
> will call f_float for any value of x where the call succeeds at all.
>
> 1.c) Problem: Multiple values for keyword argument
>
> At the same place in the boost.python code where this 'first-match'
> overload resolution is done, the user error 'multiple values for
> keyword
> argument' is not checked. Neal Becker recently pointed this out.
> With
> boost.python it looks like this:
>
> int addem(int x, int y, int z) { return x*100 + y*10 + z; }
>
> BOOST_PYTHON_MODULE(M)
> {
> def("addem", &addem, (arg("x"), arg("y"), arg("z")));
> }
>
>>>> from M import addem
>>>> addem(1, 8, 2, x=4)
> Traceback (most recent call last):
> ...
> ArgumentError: Python argument types in
> M.addem(int, int, int)
> did not match C++ signature:
> addem(int x, int y, int z)
>
> That error message is very confusing... f(int,int,int) doesn't match
> f(int,int,int)? The pure python version reports something more
> sensible:
>
>>>> def g(x,y,z, **kwargs):
> ... print x,y,z,kwargs
> ...
>>>> g(1,2,3,x=4)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: g() got multiple values for keyword argument 'x'
>
>
> 2.) An implemented solution
>
> I've got something implemented. Here's what it does.
>
> 2.a) Solution: multiple values for keyword
>
> The easiest case to catch is the last [1.c]. It is also orthogonal to
> the others. If a registered function has keyword arguments, check for
> multiple keyword values, and raise an error if so:
>
>>>> from M import addem
>>>> addem(1,2,3,x=1)
> Boost.Python.TypeError: M.addem() got multiple values for keyword
> argument 'x'
>
> 2.b) Solution: "necessarily" ambiguous overloads
>
> The next easiest thing to catch is case [1.b], "necessarily" ambiguous
> registrations. It proceeds as follows: at import time, as each new
> overload V for function F is registered, compare V to the existing
> overloads EO for F. If V has the same signature as something in EO,
> raise an AmbiguousOverload exception. For instance, if you load the
> module from [1.b] above, you get:
>
>>>> import m
> Traceback (most recent call last):
> ...
> AmbiguousOverload: Boost.Python function m.f
> has ambiguous overloads. C++ signatures
> f(float) -> None
> and
> f(double) -> None
> are indistinguishable to python.
>
> Again this is because c++ float and c++ double both map to python
> 'float'. This one is "easy" as it happens only once (not at every
> function call) and doesn't take up any space.
>
> 2.c) Solution: ambiguous calls
>
> The hard case is [1.a]:
>
> void f_bool(bool);
> void f_int(int);
>
> BOOST_PYTHON_MODULE(m)
> {
> def(f, f_bool);
> def(f, f_int);
> }
>
> For module 'm' above, a call to f(True) or f(1) should succeed and
> call
> the corresponding function. Passing a float, however, is ambiguous:
>
>>>> f(True) # ok
>>>> f(1) # ok
>>>> f(1.0)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function
> m.f
> C++ signatures:
> f(int)
> f(bool)
>
> So the implementation has some how 'scored' each possible overload and
> in each case either used the best one or reported ambiguity if
> multiple
> overloads are tied for 'best'.
>
> The scoring works as follows.
>
> In the guts of boost.python we recieve a tuple of arguments of type
> PyObject*. We also have a list of overloads, each with an mpl vector
> representing the signature of the associated C++ function. This is
> unchanged from the released version.
>
> What has been done until now is to attempt to call each overload in
> order and return the result of the first that succeeds.
>
> In this new implmentation, for each c++ argument of type T_n, we ask a
> class overload_score<T_n> to score the conversion of the type
> (PyTypeObject*) of this PyObject* to T_n. Example:
>
> template <>
> struct overload_score<int>
> {
> boost::optional<unsigned> operator()(PyObject* type)
> {
> if(PyInt_CheckExact(type))
> return boost::optional<unsigned>(0); // int == perfect
> else if (PyBool_Check(type))
> return boost::optional<unsigned>(1); // bool == okay
> else if (PyFloat_CheckExact(type))
> return boost::optional<unsigned>(1); // float == okay
> else if(arg_from_python<int>(type).convertible())
> return boost::optional<unsigned>(1); // fallback
> else
> return boost::optional<unsigned>(); // unsuitable
> }
> };
>
> The "score" type is optional<unsigned>. A valueless,
> (default-constructed) optional<unsigned> means 'unsuitable'.
> optional<unsigned>(0) is a perfect match, and
> optional<unsigned>(value)
> for value > 0 is a workable but less than perfect match.
>
> These per-argument scores are added together for all arguments: this
> is
> the overload's total score. If any argument is 'unsuitable', the
> total
> score is 'unsuitable'.
>
> If there is a tie for the best (lowest) score, the call is ambiguous
> and
> a Boost.Python.AmbiguousCall exception is raised. If there is only
> one
> function in first place, call it.
>
> 3.) Implications
>
> This breaks a few corner cases that I've found in the tests.
>
> 3.a) implied init<> registration
>
> I have found a few instances like this one:
>
> struct X
> {
> X();
> X(int);
> };
>
> class_<X>("X") // #1
> .def(init<>()) // #2
> .def(init<int>())
> ;
>
> Here, #1 causes a default constructor to be registered, as does #2.
> This will cause a throw at load time as in [2.b]. It is simple to
> fix.
>
> 3.b) raw_constructor
>
> The test suite for raw_constructor depends on our first_match overload
> resolution. The fix is to move a few things around to get the same
> effect without relying on this behavior.
>
> The "before" code looks like this:
>
> class Foo
> {
> public:
> Foo(tuple args, dict kw)
> : args(args), kw(kw) {}
>
> tuple args;
> dict kw;
> };
>
> object init_foo(tuple args, dict kw)
> {
> tuple rest(args.slice(1,_));
> return args[0].attr("__init__")(rest, kw);
> }
>
> BOOST_PYTHON_MODULE(raw_ctor_ext)
> {
> // using no_init postpones defining __init__ function until after
> // raw_function for proper overload resolution order, since later
> // defs get higher priority.
> class_<Foo>("Foo", no_init)
> .def("__init__", raw_function(&init_foo))
> .def(init<tuple, dict>())
> .def_readwrite("args", &Foo::args)
> .def_readwrite("kw", &Foo::kw)
> ;
> }
>
> The "after" code does the registration piecemeal: Maybe there is a
> better way to do this. To my mind this is a little better because it
> makes explicit what is happening rather than relying on some subtle
> property of overload resolution:
>
> class Foo
> {
> public:
> Foo(tuple args, dict kw)
> : args(args), kw(kw) {}
>
> tuple args;
> dict kw;
> };
>
> // our custom factory function
> object init_foo(tuple args, dict kw)
> {
> tuple rest(args.slice(1,_));
> return args[0].attr("__real_init__")(rest, kw);
> }
>
> BOOST_PYTHON_MODULE(raw_ctor_ext)
> {
> // to get the desired effect we register
> // the actual constructor and our 'raw' constructor,
> // and then rename them
> class_<Foo> c("Foo", init<tuple, dict>());
> c
> .def("__tmp_init__", raw_function(&init_foo))
> .def_readwrite("args", &Foo::args)
> .def_readwrite("kw", &Foo::kw)
> ;
>
> //
> // __init__ => __real_init__
> // __tmp_init__ => __init__
> //
> object real_constructor = getattr(c, "__init__");
> object raw_constructor = getattr(c, "__tmp_init__");
>
> setattr(c, "__init__", raw_constructor);
> delattr(c, "__tmp_init__");
> setattr(c, "__real_init__", real_constructor);
>
> }
>
> And that basically covers it. Looking forward to comments/feedback.
>
> The code is here:
>
> http://gitorious.org/~straszheim/boost/straszheims-python/trees/
> master
>
> Note that the headers are in subdirectory 'include'... If there is
> enough response to this mail it I'll send out instructions on how to
> get
> a working build going.
>
> - t
> _______________________________________________
> Cplusplus-sig mailing list
> Cplusplus-sig at python.org
> http://mail.python.org/mailman/listinfo/cplusplus-sig
More information about the Cplusplus-sig
mailing list