[C++-sig] Implementation of proper overload resolution

Thu Dec 17 20:31:02 CET 2009

Troy,

Incidentally, do you know the proper way  (or is there a proper way?)  
to support None type keyword arguments?

Cheers,

Dane

On Dec 17, 2009, at 11:11 AM, Dane Springmeyer wrote:

> Troy,
>
> Really __impressive__ write-up and work on this. I'd definitely be  
> interested in build instructions and your work getting into future  
> releases.
>
> Cheers,
>
> Dane
>
>
> On Dec 17, 2009, at 8:08 AM, Troy D. Straszheim wrote:
>
>>
>> Here's what I've got on overloading.  This turned out to be a lot  
>> more
>> work, and this mail much longer, than I'd hoped.  Would appreciate a
>> readthrough and comments/questions, I've done about all I can do.   
>> First
>> a review of the problems, then a walkthrough of an implementation I  
>> have
>> that fixes them.
>>
>> 1.a)  Problem: Ambiguous calls from python to overloaded functions
>>
>> Boost.python's 'overloading' is and has always been broken w.r.t. is
>> ambiguity.  The library currently has no notion, given some tuple of
>> arguments, of one function being a better match than another.  It  
>> knows
>> only "this function matches" or not.  What the user observes is the
>> following: when a call to a c++ function is made from python e.g.
>>
>>>>> m.f(1)
>>
>> boost.python looks through the overloads for f, in the reverse  
>> order of
>> registration, and calls the first one that is callable.  For  
>> instance if
>> the registrations are
>>
>> void f_int(int);
>> void f_bool(bool);
>>
>> BOOST_PYTHON_MODULE(m)
>> {
>> def("f", f_int);
>> def("f", f_bool);
>> }
>>
>> then f_bool will execute this call of f(1), since the python 'int'
>> converts to c++ bool.  If the overloads were registered in the  
>> reverse
>> order, f_int would be executed.
>>
>> In this case the c++ overloads in question are potentially
>> distinguishable from one another, as python has distinct types bool  
>> and
>> int.  So these aren't "necessarily" ambiguous.
>>
>> 1.b) Problem: "Necessarily" ambiguous overloads
>>
>> There are some overload sets that *are* necessarily ambiguous.  This
>> set, for instance,
>>
>> void f_double(double);
>> void f_float(float);
>>
>> BOOST_PYTHON_MODULE(m)
>> {
>>   def("f", f_double);
>>   def("f", f_float);
>> }
>>
>> can never be unambiguous, since python has no 'float' type.  I.e.
>>
>>>>> f(x)
>>
>> will call f_float for any value of x where the call succeeds at all.
>>
>> 1.c)  Problem: Multiple values for keyword argument
>>
>> At the same place in the boost.python code where this 'first-match'
>> overload resolution is done, the user error 'multiple values for  
>> keyword
>> argument' is not checked.  Neal Becker recently pointed this out.   
>> With
>> boost.python it looks like this:
>>
>> int addem(int x, int y, int z) { return x*100 + y*10 + z; }
>>
>> BOOST_PYTHON_MODULE(M)
>> {
>>   def("addem", &addem, (arg("x"), arg("y"), arg("z")));
>> }
>>
>>>>> from M import addem
>>>>> addem(1, 8, 2, x=4)
>> Traceback (most recent call last):
>> ...
>> ArgumentError: Python argument types in
>>     M.addem(int, int, int)
>> did not match C++ signature:
>>     addem(int x, int y, int z)
>>
>> That error message is very confusing...  f(int,int,int) doesn't match
>> f(int,int,int)?  The pure python version reports something more
>> sensible:
>>
>>>>> def g(x,y,z, **kwargs):
>> ...     print x,y,z,kwargs
>> ...
>>>>> g(1,2,3,x=4)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: g() got multiple values for keyword argument 'x'
>>
>>
>> 2.) An implemented solution
>>
>> I've got something implemented.  Here's what it does.
>>
>> 2.a) Solution: multiple values for keyword
>>
>> The easiest case to catch is the last [1.c].  It is also orthogonal  
>> to
>> the others.  If a registered function has keyword arguments, check  
>> for
>> multiple keyword values, and raise an error if so:
>>
>>>>> from M import addem
>>>>> addem(1,2,3,x=1)
>> Boost.Python.TypeError: M.addem() got multiple values for keyword  
>> argument 'x'
>>
>> 2.b) Solution: "necessarily" ambiguous overloads
>>
>> The next easiest thing to catch is case [1.b], "necessarily"  
>> ambiguous
>> registrations.  It proceeds as follows: at import time, as each new
>> overload V for function F is registered, compare V to the existing
>> overloads EO for F.  If V has the same signature as something in EO,
>> raise an AmbiguousOverload exception.  For instance, if you load the
>> module from [1.b] above, you get:
>>
>>>>> import m
>> Traceback (most recent call last):
>> ...
>> AmbiguousOverload: Boost.Python function m.f
>> has ambiguous overloads.  C++ signatures
>>   f(float) -> None
>> and
>>   f(double) -> None
>> are indistinguishable to python.
>>
>> Again this is because c++ float and c++ double both map to python
>> 'float'.   This one is "easy" as it happens only once (not at every
>> function call) and doesn't take up any space.
>>
>> 2.c) Solution: ambiguous calls
>>
>> The hard case is [1.a]:
>>
>> void f_bool(bool);
>> void f_int(int);
>>
>> BOOST_PYTHON_MODULE(m)
>> {
>>    def(f, f_bool);
>>    def(f, f_int);
>> }
>>
>> For module 'm' above, a call to f(True) or f(1) should succeed and  
>> call
>> the corresponding function.  Passing a float, however, is ambiguous:
>>
>>>>> f(True)  # ok
>>>>> f(1)     # ok
>>>>> f(1.0)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function  
>> m.f
>> C++ signatures:
>>     f(int)
>>     f(bool)
>>
>> So the implementation has some how 'scored' each possible overload  
>> and
>> in each case either used the best one or reported ambiguity if  
>> multiple
>> overloads are tied for 'best'.
>>
>> The scoring works as follows.
>>
>> In the guts of boost.python we recieve a tuple of arguments of type
>> PyObject*.  We also have a list of overloads, each with an mpl vector
>> representing the signature of the associated C++ function.  This is
>> unchanged from the released version.
>>
>> What has been done until now is to attempt to call each overload in
>> order and return the result of the first that succeeds.
>>
>> In this new implmentation, for each c++ argument of type T_n, we  
>> ask a
>> class overload_score<T_n> to score the conversion of the type
>> (PyTypeObject*) of this PyObject* to T_n.  Example:
>>
>>  template <>
>>  struct overload_score<int>
>>  {
>>    boost::optional<unsigned> operator()(PyObject* type)
>>    {
>>      if(PyInt_CheckExact(type))
>>          return boost::optional<unsigned>(0); // int == perfect
>>      else if (PyBool_Check(type))
>>          return boost::optional<unsigned>(1); // bool == okay
>>      else if (PyFloat_CheckExact(type))
>>        return boost::optional<unsigned>(1);   // float == okay
>>      else if(arg_from_python<int>(type).convertible())
>>        return boost::optional<unsigned>(1);   // fallback
>>      else
>>        return boost::optional<unsigned>();    // unsuitable
>>    }
>>  };
>>
>> The "score" type is optional<unsigned>.  A valueless,
>> (default-constructed) optional<unsigned> means 'unsuitable'.
>> optional<unsigned>(0) is a perfect match, and  
>> optional<unsigned>(value)
>> for value > 0 is a workable but less than perfect match.
>>
>> These per-argument scores are added together for all arguments:  
>> this is
>> the overload's total score.  If any argument is 'unsuitable', the  
>> total
>> score is 'unsuitable'.
>>
>> If there is a tie for the best (lowest) score, the call is  
>> ambiguous and
>> a Boost.Python.AmbiguousCall exception is raised.  If there is only  
>> one
>> function in first place, call it.
>>
>> 3.)  Implications
>>
>> This breaks a few corner cases that I've found in the tests.
>>
>> 3.a) implied init<> registration
>>
>> I have found a few instances like this one:
>>
>> struct X
>> {
>>   X();
>>   X(int);
>> };
>>
>> class_<X>("X")       // #1
>>   .def(init<>())     // #2
>>   .def(init<int>())
>>   ;
>>
>> Here, #1 causes a default constructor to be registered, as does #2.
>> This will cause a throw at load time as in [2.b].  It is simple to  
>> fix.
>>
>> 3.b)  raw_constructor
>>
>> The test suite for raw_constructor depends on our first_match  
>> overload
>> resolution.  The fix is to move a few things around to get the same
>> effect without relying on this behavior.
>>
>> The "before" code looks like this:
>>
>> class Foo
>> {
>>  public:
>>     Foo(tuple args, dict kw)
>>       : args(args), kw(kw) {}
>>
>>     tuple args;
>>     dict kw;
>> };
>>
>> object init_foo(tuple args, dict kw)
>> {
>>     tuple rest(args.slice(1,_));
>>     return args[0].attr("__init__")(rest, kw);
>> }
>>
>> BOOST_PYTHON_MODULE(raw_ctor_ext)
>> {
>>     // using no_init postpones defining __init__ function until after
>>     // raw_function for proper overload resolution order, since later
>>     // defs get higher priority.
>>     class_<Foo>("Foo", no_init)
>>         .def("__init__", raw_function(&init_foo))
>>         .def(init<tuple, dict>())
>>         .def_readwrite("args", &Foo::args)
>>         .def_readwrite("kw", &Foo::kw)
>>         ;
>> }
>>
>> The "after" code does the registration piecemeal: Maybe there is a
>> better way to do this.  To my mind this is a little better because it
>> makes explicit what is happening rather than relying on some subtle
>> property of overload resolution:
>>
>> class Foo
>> {
>>  public:
>>   Foo(tuple args, dict kw)
>>     : args(args), kw(kw) {}
>>
>>     tuple args;
>>     dict kw;
>> };
>>
>> // our custom factory function
>> object init_foo(tuple args, dict kw)
>> {
>>     tuple rest(args.slice(1,_));
>>     return args[0].attr("__real_init__")(rest, kw);
>> }
>>
>> BOOST_PYTHON_MODULE(raw_ctor_ext)
>> {
>>   // to get the desired effect we register
>>   // the actual constructor and our 'raw' constructor,
>>   // and then rename them
>>   class_<Foo> c("Foo", init<tuple, dict>());
>>   c
>>     .def("__tmp_init__", raw_function(&init_foo))
>>     .def_readwrite("args", &Foo::args)
>>     .def_readwrite("kw", &Foo::kw)
>>     ;
>>
>>   //
>>   //  __init__ => __real_init__
>>   //  __tmp_init__ => __init__
>>   //
>>   object real_constructor = getattr(c, "__init__");
>>   object raw_constructor = getattr(c, "__tmp_init__");
>>
>>   setattr(c, "__init__", raw_constructor);
>>   delattr(c, "__tmp_init__");
>>   setattr(c, "__real_init__", real_constructor);
>>
>> }
>>
>> And that basically covers it.  Looking forward to comments/feedback.
>>
>> The code is here:
>>
>> http://gitorious.org/~straszheim/boost/straszheims-python/trees/ 
>> master
>>
>> Note that the headers are in subdirectory 'include'...  If there is
>> enough response to this mail it I'll send out instructions on how  
>> to get
>> a working build going.
>>
>> - t
>> _______________________________________________
>> Cplusplus-sig mailing list
>> Cplusplus-sig at python.org
>> http://mail.python.org/mailman/listinfo/cplusplus-sig
>
> _______________________________________________
> Cplusplus-sig mailing list
> Cplusplus-sig at python.org
> http://mail.python.org/mailman/listinfo/cplusplus-sig