[C++-sig] Implementation of proper overload resolution

Thu Dec 17 20:11:50 CET 2009

Troy,

Really __impressive__ write-up and work on this. I'd definitely be  
interested in build instructions and your work getting into future  
releases.

Cheers,

Dane

On Dec 17, 2009, at 8:08 AM, Troy D. Straszheim wrote:

>
> Here's what I've got on overloading.  This turned out to be a lot more
> work, and this mail much longer, than I'd hoped.  Would appreciate a
> readthrough and comments/questions, I've done about all I can do.   
> First
> a review of the problems, then a walkthrough of an implementation I  
> have
> that fixes them.
>
> 1.a)  Problem: Ambiguous calls from python to overloaded functions
>
> Boost.python's 'overloading' is and has always been broken w.r.t. is
> ambiguity.  The library currently has no notion, given some tuple of
> arguments, of one function being a better match than another.  It  
> knows
> only "this function matches" or not.  What the user observes is the
> following: when a call to a c++ function is made from python e.g.
>
>>>> m.f(1)
>
> boost.python looks through the overloads for f, in the reverse order  
> of
> registration, and calls the first one that is callable.  For  
> instance if
> the registrations are
>
> void f_int(int);
> void f_bool(bool);
>
> BOOST_PYTHON_MODULE(m)
> {
>  def("f", f_int);
>  def("f", f_bool);
> }
>
> then f_bool will execute this call of f(1), since the python 'int'
> converts to c++ bool.  If the overloads were registered in the reverse
> order, f_int would be executed.
>
> In this case the c++ overloads in question are potentially
> distinguishable from one another, as python has distinct types bool  
> and
> int.  So these aren't "necessarily" ambiguous.
>
> 1.b) Problem: "Necessarily" ambiguous overloads
>
> There are some overload sets that *are* necessarily ambiguous.  This
> set, for instance,
>
>  void f_double(double);
>  void f_float(float);
>
>  BOOST_PYTHON_MODULE(m)
>  {
>    def("f", f_double);
>    def("f", f_float);
>  }
>
> can never be unambiguous, since python has no 'float' type.  I.e.
>
>>>> f(x)
>
> will call f_float for any value of x where the call succeeds at all.
>
> 1.c)  Problem: Multiple values for keyword argument
>
> At the same place in the boost.python code where this 'first-match'
> overload resolution is done, the user error 'multiple values for  
> keyword
> argument' is not checked.  Neal Becker recently pointed this out.   
> With
> boost.python it looks like this:
>
>  int addem(int x, int y, int z) { return x*100 + y*10 + z; }
>
>  BOOST_PYTHON_MODULE(M)
>  {
>    def("addem", &addem, (arg("x"), arg("y"), arg("z")));
>  }
>
>>>> from M import addem
>>>> addem(1, 8, 2, x=4)
>  Traceback (most recent call last):
>  ...
>  ArgumentError: Python argument types in
>      M.addem(int, int, int)
>  did not match C++ signature:
>      addem(int x, int y, int z)
>
> That error message is very confusing...  f(int,int,int) doesn't match
> f(int,int,int)?  The pure python version reports something more
> sensible:
>
>>>> def g(x,y,z, **kwargs):
>  ...     print x,y,z,kwargs
>  ...
>>>> g(1,2,3,x=4)
>  Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>  TypeError: g() got multiple values for keyword argument 'x'
>
>
> 2.) An implemented solution
>
> I've got something implemented.  Here's what it does.
>
> 2.a) Solution: multiple values for keyword
>
> The easiest case to catch is the last [1.c].  It is also orthogonal to
> the others.  If a registered function has keyword arguments, check for
> multiple keyword values, and raise an error if so:
>
>>>> from M import addem
>>>> addem(1,2,3,x=1)
>  Boost.Python.TypeError: M.addem() got multiple values for keyword  
> argument 'x'
>
> 2.b) Solution: "necessarily" ambiguous overloads
>
> The next easiest thing to catch is case [1.b], "necessarily" ambiguous
> registrations.  It proceeds as follows: at import time, as each new
> overload V for function F is registered, compare V to the existing
> overloads EO for F.  If V has the same signature as something in EO,
> raise an AmbiguousOverload exception.  For instance, if you load the
> module from [1.b] above, you get:
>
>>>> import m
>  Traceback (most recent call last):
>  ...
>  AmbiguousOverload: Boost.Python function m.f
>  has ambiguous overloads.  C++ signatures
>    f(float) -> None
>  and
>    f(double) -> None
>  are indistinguishable to python.
>
> Again this is because c++ float and c++ double both map to python
> 'float'.   This one is "easy" as it happens only once (not at every
> function call) and doesn't take up any space.
>
> 2.c) Solution: ambiguous calls
>
> The hard case is [1.a]:
>
>  void f_bool(bool);
>  void f_int(int);
>
>  BOOST_PYTHON_MODULE(m)
>  {
>     def(f, f_bool);
>     def(f, f_int);
>  }
>
> For module 'm' above, a call to f(True) or f(1) should succeed and  
> call
> the corresponding function.  Passing a float, however, is ambiguous:
>
>>>> f(True)  # ok
>>>> f(1)     # ok
>>>> f(1.0)
>  Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>  Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function  
> m.f
>  C++ signatures:
>      f(int)
>      f(bool)
>
> So the implementation has some how 'scored' each possible overload and
> in each case either used the best one or reported ambiguity if  
> multiple
> overloads are tied for 'best'.
>
> The scoring works as follows.
>
> In the guts of boost.python we recieve a tuple of arguments of type
> PyObject*.  We also have a list of overloads, each with an mpl vector
> representing the signature of the associated C++ function.  This is
> unchanged from the released version.
>
> What has been done until now is to attempt to call each overload in
> order and return the result of the first that succeeds.
>
> In this new implmentation, for each c++ argument of type T_n, we ask a
> class overload_score<T_n> to score the conversion of the type
> (PyTypeObject*) of this PyObject* to T_n.  Example:
>
>   template <>
>   struct overload_score<int>
>   {
>     boost::optional<unsigned> operator()(PyObject* type)
>     {
>       if(PyInt_CheckExact(type))
>           return boost::optional<unsigned>(0); // int == perfect
>       else if (PyBool_Check(type))
>           return boost::optional<unsigned>(1); // bool == okay
>       else if (PyFloat_CheckExact(type))
>         return boost::optional<unsigned>(1);   // float == okay
>       else if(arg_from_python<int>(type).convertible())
>         return boost::optional<unsigned>(1);   // fallback
>       else
>         return boost::optional<unsigned>();    // unsuitable
>     }
>   };
>
> The "score" type is optional<unsigned>.  A valueless,
> (default-constructed) optional<unsigned> means 'unsuitable'.
> optional<unsigned>(0) is a perfect match, and  
> optional<unsigned>(value)
> for value > 0 is a workable but less than perfect match.
>
> These per-argument scores are added together for all arguments: this  
> is
> the overload's total score.  If any argument is 'unsuitable', the  
> total
> score is 'unsuitable'.
>
> If there is a tie for the best (lowest) score, the call is ambiguous  
> and
> a Boost.Python.AmbiguousCall exception is raised.  If there is only  
> one
> function in first place, call it.
>
> 3.)  Implications
>
> This breaks a few corner cases that I've found in the tests.
>
> 3.a) implied init<> registration
>
> I have found a few instances like this one:
>
>  struct X
>  {
>    X();
>    X(int);
>  };
>
>  class_<X>("X")       // #1
>    .def(init<>())     // #2
>    .def(init<int>())
>    ;
>
> Here, #1 causes a default constructor to be registered, as does #2.
> This will cause a throw at load time as in [2.b].  It is simple to  
> fix.
>
> 3.b)  raw_constructor
>
> The test suite for raw_constructor depends on our first_match overload
> resolution.  The fix is to move a few things around to get the same
> effect without relying on this behavior.
>
> The "before" code looks like this:
>
>  class Foo
>  {
>   public:
>      Foo(tuple args, dict kw)
>        : args(args), kw(kw) {}
>
>      tuple args;
>      dict kw;
>  };
>
>  object init_foo(tuple args, dict kw)
>  {
>      tuple rest(args.slice(1,_));
>      return args[0].attr("__init__")(rest, kw);
>  }
>
>  BOOST_PYTHON_MODULE(raw_ctor_ext)
>  {
>      // using no_init postpones defining __init__ function until after
>      // raw_function for proper overload resolution order, since later
>      // defs get higher priority.
>      class_<Foo>("Foo", no_init)
>          .def("__init__", raw_function(&init_foo))
>          .def(init<tuple, dict>())
>          .def_readwrite("args", &Foo::args)
>          .def_readwrite("kw", &Foo::kw)
>          ;
>  }
>
> The "after" code does the registration piecemeal: Maybe there is a
> better way to do this.  To my mind this is a little better because it
> makes explicit what is happening rather than relying on some subtle
> property of overload resolution:
>
>  class Foo
>  {
>   public:
>    Foo(tuple args, dict kw)
>      : args(args), kw(kw) {}
>
>      tuple args;
>      dict kw;
>  };
>
>  // our custom factory function
>  object init_foo(tuple args, dict kw)
>  {
>      tuple rest(args.slice(1,_));
>      return args[0].attr("__real_init__")(rest, kw);
>  }
>
>  BOOST_PYTHON_MODULE(raw_ctor_ext)
>  {
>    // to get the desired effect we register
>    // the actual constructor and our 'raw' constructor,
>    // and then rename them
>    class_<Foo> c("Foo", init<tuple, dict>());
>    c
>      .def("__tmp_init__", raw_function(&init_foo))
>      .def_readwrite("args", &Foo::args)
>      .def_readwrite("kw", &Foo::kw)
>      ;
>
>    //
>    //  __init__ => __real_init__
>    //  __tmp_init__ => __init__
>    //
>    object real_constructor = getattr(c, "__init__");
>    object raw_constructor = getattr(c, "__tmp_init__");
>
>    setattr(c, "__init__", raw_constructor);
>    delattr(c, "__tmp_init__");
>    setattr(c, "__real_init__", real_constructor);
>
>  }
>
> And that basically covers it.  Looking forward to comments/feedback.
>
> The code is here:
>
>  http://gitorious.org/~straszheim/boost/straszheims-python/trees/ 
> master
>
> Note that the headers are in subdirectory 'include'...  If there is
> enough response to this mail it I'll send out instructions on how to  
> get
> a working build going.
>
> - t
> _______________________________________________
> Cplusplus-sig mailing list
> Cplusplus-sig at python.org
> http://mail.python.org/mailman/listinfo/cplusplus-sig