Implementation of proper overload resolution
Here's what I've got on overloading. This turned out to be a lot more work, and this mail much longer, than I'd hoped. Would appreciate a readthrough and comments/questions, I've done about all I can do. First a review of the problems, then a walkthrough of an implementation I have that fixes them. 1.a) Problem: Ambiguous calls from python to overloaded functions Boost.python's 'overloading' is and has always been broken w.r.t. is ambiguity. The library currently has no notion, given some tuple of arguments, of one function being a better match than another. It knows only "this function matches" or not. What the user observes is the following: when a call to a c++ function is made from python e.g.
m.f(1)
boost.python looks through the overloads for f, in the reverse order of registration, and calls the first one that is callable. For instance if the registrations are void f_int(int); void f_bool(bool); BOOST_PYTHON_MODULE(m) { def("f", f_int); def("f", f_bool); } then f_bool will execute this call of f(1), since the python 'int' converts to c++ bool. If the overloads were registered in the reverse order, f_int would be executed. In this case the c++ overloads in question are potentially distinguishable from one another, as python has distinct types bool and int. So these aren't "necessarily" ambiguous. 1.b) Problem: "Necessarily" ambiguous overloads There are some overload sets that *are* necessarily ambiguous. This set, for instance, void f_double(double); void f_float(float); BOOST_PYTHON_MODULE(m) { def("f", f_double); def("f", f_float); } can never be unambiguous, since python has no 'float' type. I.e.
f(x)
will call f_float for any value of x where the call succeeds at all. 1.c) Problem: Multiple values for keyword argument At the same place in the boost.python code where this 'first-match' overload resolution is done, the user error 'multiple values for keyword argument' is not checked. Neal Becker recently pointed this out. With boost.python it looks like this: int addem(int x, int y, int z) { return x*100 + y*10 + z; } BOOST_PYTHON_MODULE(M) { def("addem", &addem, (arg("x"), arg("y"), arg("z"))); }
from M import addem addem(1, 8, 2, x=4) Traceback (most recent call last): ... ArgumentError: Python argument types in M.addem(int, int, int) did not match C++ signature: addem(int x, int y, int z)
That error message is very confusing... f(int,int,int) doesn't match f(int,int,int)? The pure python version reports something more sensible:
def g(x,y,z, **kwargs): ... print x,y,z,kwargs ... g(1,2,3,x=4) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: g() got multiple values for keyword argument 'x'
2.) An implemented solution I've got something implemented. Here's what it does. 2.a) Solution: multiple values for keyword The easiest case to catch is the last [1.c]. It is also orthogonal to the others. If a registered function has keyword arguments, check for multiple keyword values, and raise an error if so:
from M import addem addem(1,2,3,x=1) Boost.Python.TypeError: M.addem() got multiple values for keyword argument 'x'
2.b) Solution: "necessarily" ambiguous overloads The next easiest thing to catch is case [1.b], "necessarily" ambiguous registrations. It proceeds as follows: at import time, as each new overload V for function F is registered, compare V to the existing overloads EO for F. If V has the same signature as something in EO, raise an AmbiguousOverload exception. For instance, if you load the module from [1.b] above, you get:
import m Traceback (most recent call last): ... AmbiguousOverload: Boost.Python function m.f has ambiguous overloads. C++ signatures f(float) -> None and f(double) -> None are indistinguishable to python.
Again this is because c++ float and c++ double both map to python 'float'. This one is "easy" as it happens only once (not at every function call) and doesn't take up any space. 2.c) Solution: ambiguous calls The hard case is [1.a]: void f_bool(bool); void f_int(int); BOOST_PYTHON_MODULE(m) { def(f, f_bool); def(f, f_int); } For module 'm' above, a call to f(True) or f(1) should succeed and call the corresponding function. Passing a float, however, is ambiguous:
f(True) # ok f(1) # ok f(1.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function m.f C++ signatures: f(int) f(bool)
So the implementation has some how 'scored' each possible overload and in each case either used the best one or reported ambiguity if multiple overloads are tied for 'best'. The scoring works as follows. In the guts of boost.python we recieve a tuple of arguments of type PyObject*. We also have a list of overloads, each with an mpl vector representing the signature of the associated C++ function. This is unchanged from the released version. What has been done until now is to attempt to call each overload in order and return the result of the first that succeeds. In this new implmentation, for each c++ argument of type T_n, we ask a class overload_score<T_n> to score the conversion of the type (PyTypeObject*) of this PyObject* to T_n. Example: template <> struct overload_score<int> { boost::optional<unsigned> operator()(PyObject* type) { if(PyInt_CheckExact(type)) return boost::optional<unsigned>(0); // int == perfect else if (PyBool_Check(type)) return boost::optional<unsigned>(1); // bool == okay else if (PyFloat_CheckExact(type)) return boost::optional<unsigned>(1); // float == okay else if(arg_from_python<int>(type).convertible()) return boost::optional<unsigned>(1); // fallback else return boost::optional<unsigned>(); // unsuitable } }; The "score" type is optional<unsigned>. A valueless, (default-constructed) optional<unsigned> means 'unsuitable'. optional<unsigned>(0) is a perfect match, and optional<unsigned>(value) for value > 0 is a workable but less than perfect match. These per-argument scores are added together for all arguments: this is the overload's total score. If any argument is 'unsuitable', the total score is 'unsuitable'. If there is a tie for the best (lowest) score, the call is ambiguous and a Boost.Python.AmbiguousCall exception is raised. If there is only one function in first place, call it. 3.) Implications This breaks a few corner cases that I've found in the tests. 3.a) implied init<> registration I have found a few instances like this one: struct X { X(); X(int); }; class_<X>("X") // #1 .def(init<>()) // #2 .def(init<int>()) ; Here, #1 causes a default constructor to be registered, as does #2. This will cause a throw at load time as in [2.b]. It is simple to fix. 3.b) raw_constructor The test suite for raw_constructor depends on our first_match overload resolution. The fix is to move a few things around to get the same effect without relying on this behavior. The "before" code looks like this: class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {} tuple args; dict kw; }; object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__init__")(rest, kw); } BOOST_PYTHON_MODULE(raw_ctor_ext) { // using no_init postpones defining __init__ function until after // raw_function for proper overload resolution order, since later // defs get higher priority. class_<Foo>("Foo", no_init) .def("__init__", raw_function(&init_foo)) .def(init<tuple, dict>()) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ; } The "after" code does the registration piecemeal: Maybe there is a better way to do this. To my mind this is a little better because it makes explicit what is happening rather than relying on some subtle property of overload resolution: class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {} tuple args; dict kw; }; // our custom factory function object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__real_init__")(rest, kw); } BOOST_PYTHON_MODULE(raw_ctor_ext) { // to get the desired effect we register // the actual constructor and our 'raw' constructor, // and then rename them class_<Foo> c("Foo", init<tuple, dict>()); c .def("__tmp_init__", raw_function(&init_foo)) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ; // // __init__ => __real_init__ // __tmp_init__ => __init__ // object real_constructor = getattr(c, "__init__"); object raw_constructor = getattr(c, "__tmp_init__"); setattr(c, "__init__", raw_constructor); delattr(c, "__tmp_init__"); setattr(c, "__real_init__", real_constructor); } And that basically covers it. Looking forward to comments/feedback. The code is here: http://gitorious.org/~straszheim/boost/straszheims-python/trees/master Note that the headers are in subdirectory 'include'... If there is enough response to this mail it I'll send out instructions on how to get a working build going. - t
Troy, Really __impressive__ write-up and work on this. I'd definitely be interested in build instructions and your work getting into future releases. Cheers, Dane On Dec 17, 2009, at 8:08 AM, Troy D. Straszheim wrote:
Here's what I've got on overloading. This turned out to be a lot more work, and this mail much longer, than I'd hoped. Would appreciate a readthrough and comments/questions, I've done about all I can do. First a review of the problems, then a walkthrough of an implementation I have that fixes them.
1.a) Problem: Ambiguous calls from python to overloaded functions
Boost.python's 'overloading' is and has always been broken w.r.t. is ambiguity. The library currently has no notion, given some tuple of arguments, of one function being a better match than another. It knows only "this function matches" or not. What the user observes is the following: when a call to a c++ function is made from python e.g.
m.f(1)
boost.python looks through the overloads for f, in the reverse order of registration, and calls the first one that is callable. For instance if the registrations are
void f_int(int); void f_bool(bool);
BOOST_PYTHON_MODULE(m) { def("f", f_int); def("f", f_bool); }
then f_bool will execute this call of f(1), since the python 'int' converts to c++ bool. If the overloads were registered in the reverse order, f_int would be executed.
In this case the c++ overloads in question are potentially distinguishable from one another, as python has distinct types bool and int. So these aren't "necessarily" ambiguous.
1.b) Problem: "Necessarily" ambiguous overloads
There are some overload sets that *are* necessarily ambiguous. This set, for instance,
void f_double(double); void f_float(float);
BOOST_PYTHON_MODULE(m) { def("f", f_double); def("f", f_float); }
can never be unambiguous, since python has no 'float' type. I.e.
f(x)
will call f_float for any value of x where the call succeeds at all.
1.c) Problem: Multiple values for keyword argument
At the same place in the boost.python code where this 'first-match' overload resolution is done, the user error 'multiple values for keyword argument' is not checked. Neal Becker recently pointed this out. With boost.python it looks like this:
int addem(int x, int y, int z) { return x*100 + y*10 + z; }
BOOST_PYTHON_MODULE(M) { def("addem", &addem, (arg("x"), arg("y"), arg("z"))); }
from M import addem addem(1, 8, 2, x=4) Traceback (most recent call last): ... ArgumentError: Python argument types in M.addem(int, int, int) did not match C++ signature: addem(int x, int y, int z)
That error message is very confusing... f(int,int,int) doesn't match f(int,int,int)? The pure python version reports something more sensible:
def g(x,y,z, **kwargs): ... print x,y,z,kwargs ... g(1,2,3,x=4) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: g() got multiple values for keyword argument 'x'
2.) An implemented solution
I've got something implemented. Here's what it does.
2.a) Solution: multiple values for keyword
The easiest case to catch is the last [1.c]. It is also orthogonal to the others. If a registered function has keyword arguments, check for multiple keyword values, and raise an error if so:
from M import addem addem(1,2,3,x=1) Boost.Python.TypeError: M.addem() got multiple values for keyword argument 'x'
2.b) Solution: "necessarily" ambiguous overloads
The next easiest thing to catch is case [1.b], "necessarily" ambiguous registrations. It proceeds as follows: at import time, as each new overload V for function F is registered, compare V to the existing overloads EO for F. If V has the same signature as something in EO, raise an AmbiguousOverload exception. For instance, if you load the module from [1.b] above, you get:
import m Traceback (most recent call last): ... AmbiguousOverload: Boost.Python function m.f has ambiguous overloads. C++ signatures f(float) -> None and f(double) -> None are indistinguishable to python.
Again this is because c++ float and c++ double both map to python 'float'. This one is "easy" as it happens only once (not at every function call) and doesn't take up any space.
2.c) Solution: ambiguous calls
The hard case is [1.a]:
void f_bool(bool); void f_int(int);
BOOST_PYTHON_MODULE(m) { def(f, f_bool); def(f, f_int); }
For module 'm' above, a call to f(True) or f(1) should succeed and call the corresponding function. Passing a float, however, is ambiguous:
f(True) # ok f(1) # ok f(1.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function m.f C++ signatures: f(int) f(bool)
So the implementation has some how 'scored' each possible overload and in each case either used the best one or reported ambiguity if multiple overloads are tied for 'best'.
The scoring works as follows.
In the guts of boost.python we recieve a tuple of arguments of type PyObject*. We also have a list of overloads, each with an mpl vector representing the signature of the associated C++ function. This is unchanged from the released version.
What has been done until now is to attempt to call each overload in order and return the result of the first that succeeds.
In this new implmentation, for each c++ argument of type T_n, we ask a class overload_score<T_n> to score the conversion of the type (PyTypeObject*) of this PyObject* to T_n. Example:
template <> struct overload_score<int> { boost::optional<unsigned> operator()(PyObject* type) { if(PyInt_CheckExact(type)) return boost::optional<unsigned>(0); // int == perfect else if (PyBool_Check(type)) return boost::optional<unsigned>(1); // bool == okay else if (PyFloat_CheckExact(type)) return boost::optional<unsigned>(1); // float == okay else if(arg_from_python<int>(type).convertible()) return boost::optional<unsigned>(1); // fallback else return boost::optional<unsigned>(); // unsuitable } };
The "score" type is optional<unsigned>. A valueless, (default-constructed) optional<unsigned> means 'unsuitable'. optional<unsigned>(0) is a perfect match, and optional<unsigned>(value) for value > 0 is a workable but less than perfect match.
These per-argument scores are added together for all arguments: this is the overload's total score. If any argument is 'unsuitable', the total score is 'unsuitable'.
If there is a tie for the best (lowest) score, the call is ambiguous and a Boost.Python.AmbiguousCall exception is raised. If there is only one function in first place, call it.
3.) Implications
This breaks a few corner cases that I've found in the tests.
3.a) implied init<> registration
I have found a few instances like this one:
struct X { X(); X(int); };
class_<X>("X") // #1 .def(init<>()) // #2 .def(init<int>()) ;
Here, #1 causes a default constructor to be registered, as does #2. This will cause a throw at load time as in [2.b]. It is simple to fix.
3.b) raw_constructor
The test suite for raw_constructor depends on our first_match overload resolution. The fix is to move a few things around to get the same effect without relying on this behavior.
The "before" code looks like this:
class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {}
tuple args; dict kw; };
object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__init__")(rest, kw); }
BOOST_PYTHON_MODULE(raw_ctor_ext) { // using no_init postpones defining __init__ function until after // raw_function for proper overload resolution order, since later // defs get higher priority. class_<Foo>("Foo", no_init) .def("__init__", raw_function(&init_foo)) .def(init<tuple, dict>()) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ; }
The "after" code does the registration piecemeal: Maybe there is a better way to do this. To my mind this is a little better because it makes explicit what is happening rather than relying on some subtle property of overload resolution:
class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {}
tuple args; dict kw; };
// our custom factory function object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__real_init__")(rest, kw); }
BOOST_PYTHON_MODULE(raw_ctor_ext) { // to get the desired effect we register // the actual constructor and our 'raw' constructor, // and then rename them class_<Foo> c("Foo", init<tuple, dict>()); c .def("__tmp_init__", raw_function(&init_foo)) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ;
// // __init__ => __real_init__ // __tmp_init__ => __init__ // object real_constructor = getattr(c, "__init__"); object raw_constructor = getattr(c, "__tmp_init__");
setattr(c, "__init__", raw_constructor); delattr(c, "__tmp_init__"); setattr(c, "__real_init__", real_constructor);
}
And that basically covers it. Looking forward to comments/feedback.
The code is here:
http://gitorious.org/~straszheim/boost/straszheims-python/trees/ master
Note that the headers are in subdirectory 'include'... If there is enough response to this mail it I'll send out instructions on how to get a working build going.
- t _______________________________________________ Cplusplus-sig mailing list Cplusplus-sig@python.org http://mail.python.org/mailman/listinfo/cplusplus-sig
Troy, Incidentally, do you know the proper way (or is there a proper way?) to support None type keyword arguments? Cheers, Dane On Dec 17, 2009, at 11:11 AM, Dane Springmeyer wrote:
Troy,
Really __impressive__ write-up and work on this. I'd definitely be interested in build instructions and your work getting into future releases.
Cheers,
Dane
On Dec 17, 2009, at 8:08 AM, Troy D. Straszheim wrote:
Here's what I've got on overloading. This turned out to be a lot more work, and this mail much longer, than I'd hoped. Would appreciate a readthrough and comments/questions, I've done about all I can do. First a review of the problems, then a walkthrough of an implementation I have that fixes them.
1.a) Problem: Ambiguous calls from python to overloaded functions
Boost.python's 'overloading' is and has always been broken w.r.t. is ambiguity. The library currently has no notion, given some tuple of arguments, of one function being a better match than another. It knows only "this function matches" or not. What the user observes is the following: when a call to a c++ function is made from python e.g.
m.f(1)
boost.python looks through the overloads for f, in the reverse order of registration, and calls the first one that is callable. For instance if the registrations are
void f_int(int); void f_bool(bool);
BOOST_PYTHON_MODULE(m) { def("f", f_int); def("f", f_bool); }
then f_bool will execute this call of f(1), since the python 'int' converts to c++ bool. If the overloads were registered in the reverse order, f_int would be executed.
In this case the c++ overloads in question are potentially distinguishable from one another, as python has distinct types bool and int. So these aren't "necessarily" ambiguous.
1.b) Problem: "Necessarily" ambiguous overloads
There are some overload sets that *are* necessarily ambiguous. This set, for instance,
void f_double(double); void f_float(float);
BOOST_PYTHON_MODULE(m) { def("f", f_double); def("f", f_float); }
can never be unambiguous, since python has no 'float' type. I.e.
f(x)
will call f_float for any value of x where the call succeeds at all.
1.c) Problem: Multiple values for keyword argument
At the same place in the boost.python code where this 'first-match' overload resolution is done, the user error 'multiple values for keyword argument' is not checked. Neal Becker recently pointed this out. With boost.python it looks like this:
int addem(int x, int y, int z) { return x*100 + y*10 + z; }
BOOST_PYTHON_MODULE(M) { def("addem", &addem, (arg("x"), arg("y"), arg("z"))); }
from M import addem addem(1, 8, 2, x=4) Traceback (most recent call last): ... ArgumentError: Python argument types in M.addem(int, int, int) did not match C++ signature: addem(int x, int y, int z)
That error message is very confusing... f(int,int,int) doesn't match f(int,int,int)? The pure python version reports something more sensible:
def g(x,y,z, **kwargs): ... print x,y,z,kwargs ... g(1,2,3,x=4) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: g() got multiple values for keyword argument 'x'
2.) An implemented solution
I've got something implemented. Here's what it does.
2.a) Solution: multiple values for keyword
The easiest case to catch is the last [1.c]. It is also orthogonal to the others. If a registered function has keyword arguments, check for multiple keyword values, and raise an error if so:
from M import addem addem(1,2,3,x=1) Boost.Python.TypeError: M.addem() got multiple values for keyword argument 'x'
2.b) Solution: "necessarily" ambiguous overloads
The next easiest thing to catch is case [1.b], "necessarily" ambiguous registrations. It proceeds as follows: at import time, as each new overload V for function F is registered, compare V to the existing overloads EO for F. If V has the same signature as something in EO, raise an AmbiguousOverload exception. For instance, if you load the module from [1.b] above, you get:
import m Traceback (most recent call last): ... AmbiguousOverload: Boost.Python function m.f has ambiguous overloads. C++ signatures f(float) -> None and f(double) -> None are indistinguishable to python.
Again this is because c++ float and c++ double both map to python 'float'. This one is "easy" as it happens only once (not at every function call) and doesn't take up any space.
2.c) Solution: ambiguous calls
The hard case is [1.a]:
void f_bool(bool); void f_int(int);
BOOST_PYTHON_MODULE(m) { def(f, f_bool); def(f, f_int); }
For module 'm' above, a call to f(True) or f(1) should succeed and call the corresponding function. Passing a float, however, is ambiguous:
f(True) # ok f(1) # ok f(1.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function m.f C++ signatures: f(int) f(bool)
So the implementation has some how 'scored' each possible overload and in each case either used the best one or reported ambiguity if multiple overloads are tied for 'best'.
The scoring works as follows.
In the guts of boost.python we recieve a tuple of arguments of type PyObject*. We also have a list of overloads, each with an mpl vector representing the signature of the associated C++ function. This is unchanged from the released version.
What has been done until now is to attempt to call each overload in order and return the result of the first that succeeds.
In this new implmentation, for each c++ argument of type T_n, we ask a class overload_score<T_n> to score the conversion of the type (PyTypeObject*) of this PyObject* to T_n. Example:
template <> struct overload_score<int> { boost::optional<unsigned> operator()(PyObject* type) { if(PyInt_CheckExact(type)) return boost::optional<unsigned>(0); // int == perfect else if (PyBool_Check(type)) return boost::optional<unsigned>(1); // bool == okay else if (PyFloat_CheckExact(type)) return boost::optional<unsigned>(1); // float == okay else if(arg_from_python<int>(type).convertible()) return boost::optional<unsigned>(1); // fallback else return boost::optional<unsigned>(); // unsuitable } };
The "score" type is optional<unsigned>. A valueless, (default-constructed) optional<unsigned> means 'unsuitable'. optional<unsigned>(0) is a perfect match, and optional<unsigned>(value) for value > 0 is a workable but less than perfect match.
These per-argument scores are added together for all arguments: this is the overload's total score. If any argument is 'unsuitable', the total score is 'unsuitable'.
If there is a tie for the best (lowest) score, the call is ambiguous and a Boost.Python.AmbiguousCall exception is raised. If there is only one function in first place, call it.
3.) Implications
This breaks a few corner cases that I've found in the tests.
3.a) implied init<> registration
I have found a few instances like this one:
struct X { X(); X(int); };
class_<X>("X") // #1 .def(init<>()) // #2 .def(init<int>()) ;
Here, #1 causes a default constructor to be registered, as does #2. This will cause a throw at load time as in [2.b]. It is simple to fix.
3.b) raw_constructor
The test suite for raw_constructor depends on our first_match overload resolution. The fix is to move a few things around to get the same effect without relying on this behavior.
The "before" code looks like this:
class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {}
tuple args; dict kw; };
object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__init__")(rest, kw); }
BOOST_PYTHON_MODULE(raw_ctor_ext) { // using no_init postpones defining __init__ function until after // raw_function for proper overload resolution order, since later // defs get higher priority. class_<Foo>("Foo", no_init) .def("__init__", raw_function(&init_foo)) .def(init<tuple, dict>()) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ; }
The "after" code does the registration piecemeal: Maybe there is a better way to do this. To my mind this is a little better because it makes explicit what is happening rather than relying on some subtle property of overload resolution:
class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {}
tuple args; dict kw; };
// our custom factory function object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__real_init__")(rest, kw); }
BOOST_PYTHON_MODULE(raw_ctor_ext) { // to get the desired effect we register // the actual constructor and our 'raw' constructor, // and then rename them class_<Foo> c("Foo", init<tuple, dict>()); c .def("__tmp_init__", raw_function(&init_foo)) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ;
// // __init__ => __real_init__ // __tmp_init__ => __init__ // object real_constructor = getattr(c, "__init__"); object raw_constructor = getattr(c, "__tmp_init__");
setattr(c, "__init__", raw_constructor); delattr(c, "__tmp_init__"); setattr(c, "__real_init__", real_constructor);
}
And that basically covers it. Looking forward to comments/feedback.
The code is here:
http://gitorious.org/~straszheim/boost/straszheims-python/trees/ master
Note that the headers are in subdirectory 'include'... If there is enough response to this mail it I'll send out instructions on how to get a working build going.
- t _______________________________________________ Cplusplus-sig mailing list Cplusplus-sig@python.org http://mail.python.org/mailman/listinfo/cplusplus-sig
_______________________________________________ Cplusplus-sig mailing list Cplusplus-sig@python.org http://mail.python.org/mailman/listinfo/cplusplus-sig
This is great, I will try this, but I'm afraid about the size, because my main problem with boost.python is the library size compared to others c++ python bindings, my current library has 16 MB compared to 5 MB from SIP, and in my bindings I have a lot of overload functions and this solution can produce some new bytes. :( Renato Araujo Oliveira Filho On Tue, Dec 22, 2009 at 12:42 AM, Troy D. Straszheim <troy@resophonic.com> wrote:
Dane Springmeyer <blake@hailmail.net> writes:
Troy,
Incidentally, do you know the proper way (or is there a proper way?) to support None type keyword arguments?
Cheers,
Dane
Could you elaborate? What're you trying to do?
-t
_______________________________________________ Cplusplus-sig mailing list Cplusplus-sig@python.org http://mail.python.org/mailman/listinfo/cplusplus-sig
I assume overload resolution extends to scoring multiple arguments as well?
Neal Becker <ndbecker2@gmail.com> writes:
I assume overload resolution extends to scoring multiple arguments as well?
Sure. This is the reason that scores are optional<unsigned>. If any single argument scorer returns optional<unsigned>() (meaning 'unsuitable'), this stops evaluation and kills the score for the entire signature. Here are some excerpts from the test suites: //------------------------------------------------------ std::string f1(float, bool) { return "float,bool"; } std::string f2(float, int) { return "float,int"; } std::string f3(float, std::string) { return "float,std::string"; } BOOST_PYTHON_MODULE(ambig6_ext) { def("f", &f1); def("f", &f2); def("f", &f3); } //------------------------------------------------------
from ambig6_ext import f help(f) Help on built-in function f:
f(...) f( (float)arg1, (bool)arg2) -> str : C++ signature : std::string f(float,bool) f( (float)arg1, (int)arg2) -> str : C++ signature : std::string f(float,int) f( (float)arg1, (str)arg2) -> str : C++ signature : std::string f(float,std::string)
f(1.0, True) # perfect match, score 0 'float,bool' f(1.0, 1) # perfect match, score 0 'float,int' f(True, True) # best match arg1 score is 1, arg2 score is 0 'float,bool'
# # Here, note that f3 is not listed in the set of ambiguous candidates, # as the arg2 score (conversion of float to string) is 'unsuitable' and # removes it from consideration #
f(1.0,1.0) # ambiguous, f1 & f2 both score 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function ambig6_ext.f C++ signatures: f(float, int) f(float, bool)
# # If the second arg is a string, then the 3rd overload # always matches... #
f(True, 'helloverloading') # best match, score 1 'float,std::string' f(1, 'helloverloading') # best match, score 1 'float,std::string' f(1.0, 'helloverloading') # perfect match, score 0 'float,std::string'
# # ... Unless the arg1 is also a string. Note this one is ArgumentError, # listing all overloads, not AmbiguousCall listing the ambig. ones. #
f('uh', 'oh') # all overloads score 'unsuitable' Boost.Python.ArgumentError: Python argument types in ambig6_ext.f(str, str) did not match C++ signature: f(float, std::string) f(float, int) f(float, bool)
-t
I am concerned that this doesn't introduce too much overhead for the common case where there is no ambiguity. I suppose this has been optimized?
Neal Becker <ndbecker2@gmail.com> writes:
I am concerned that this doesn't introduce too much overhead for the common case where there is no ambiguity. I suppose this has been optimized?
For the following module: int f(int x, int y, int z) { return x*100 + y*10 + z; } BOOST_PYTHON_MODULE(m) { def("f", &f); } and the following script: from m import f for i in range(10000000): f(i, i, i) With all optimizations turned on, Boost.Python from the 1.41.0 release runs this in (average over several runs) 4.25 seconds, and this new implementation in 4.38 seconds, so an increase of about 3% wall-clock time. This test case is designed to make the performance hit look as bad as possible: if int f(...) did anything substantial, of course the relative slowdown would be less. In an attempt to get an idea of exactly how many additional CPU ticks are involved I ran a script that just does from m import f under 'valgrind --tool=lackey' and compared the number of 'guest instructions', some measure of how much work the virtual CPU is doing: 1.41.0: guest instrs: 26,559,194 New: guest instrs: 26,864,330 So there's an additional total 305k instructions required to create and destroy the module. If you then add a single call to f() to the script: from m import f f(1,1,1) you get: 1.41.0: guest instrs: 26,593,334 New: guest instrs: 26,899,095 So 1.41.0 requires 34140 of these cpu 'ticks' to call f(), and the new version requires 34765 of them, or 625 instructions ~= 2% more. Note that this implementation is also 'fusionized', i.e. where function calling is concerned, much of the boost.preprocessor guts have been removed and replaced with boost fusion, so it is a little hard to track where this 2% is actually going. OTOH polymorphic function objects and phoenix expressions are passable to def(). The fusionization also involves some increase in library size. Presumably the difference would be less if inlining were turned off. Test Module 1.41.0 New ------------------------------------ ------ ---- andreas_beyer_ext.so 120K 120K args_ext.so 208K 264K auto_ptr_ext.so 132K 156K back_reference_ext.so 148K 176K ben_scott1_ext.so 112K 132K bienstman1_ext.so 72K 72K bienstman2_ext.so 80K 96K bienstman3_ext.so 44K 60K builtin_converters_ext.so 428K 544K callbacks_ext.so 200K 240K const_argument_ext.so 28K 28K crossmod_exception_a.so 24K 28K crossmod_exception_b.so 24K 28K crossmod_opaque_a.so 28K 36K crossmod_opaque_b.so 28K 36K data_members_ext.so 320K 372K defaults_ext.so 304K 392K dict_ext.so 80K 96K docstring_ext.so 104K 116K enum_ext.so 84K 92K exception_translator_ext.so 28K 36K extract_ext.so 192K 220K implicit_ext.so 100K 116K injected_ext.so 100K 124K input_iterator.so 124K 140K iterator_ext.so 324K 364K list_ext.so 176K 204K long_ext.so 100K 116K m1.so 292K 352K m2.so 128K 156K map_indexing_suite_ext.so 736K 876K minimal_ext.so 20K 20K multi_arg_constructor_ext.so 48K 60K nested_ext.so 112K 124K object_ext.so 332K 392K opaque_ext.so 80K 96K operators_ext.so 228K 268K pickle1_ext.so 92K 104K pickle2_ext.so 104K 124K pickle3_ext.so 116K 136K pickle4_ext.so 60K 68K pointer_vector_ext.so 220K 252K polymorphism2_auto_ptr_ext.so 200K 268K polymorphism_ext.so 216K 264K properties_ext.so 164K 188K raw_ctor_ext.so 104K 120K return_arg_ext.so 112K 132K shared_ptr_ext.so 432K 672K slice_ext.so 96K 112K staticmethod_ext.so 76K 96K stl_iterator_ext.so 116K 128K str_ext.so 68K 76K tuple_ext.so 68K 80K vector_indexing_suite_ext.so 612K 720K virtual_functions_ext.so 176K 216K voidptr_ext.so 48K 60K wrapper_held_type_ext.so 92K 112K -t
Don't you think that when these overloading problems become an issue it is a sign of a poorly designed API? I mean, if overloaded functions parameters are not completely different in type or number, then maybe they are already too confusing to use and should not be overloaded in the first place?... Just a thought... 2009/12/17 Troy D. Straszheim <troy@resophonic.com>
Here's what I've got on overloading. This turned out to be a lot more work, and this mail much longer, than I'd hoped. Would appreciate a readthrough and comments/questions, I've done about all I can do. First a review of the problems, then a walkthrough of an implementation I have that fixes them.
1.a) Problem: Ambiguous calls from python to overloaded functions
Boost.python's 'overloading' is and has always been broken w.r.t. is ambiguity. The library currently has no notion, given some tuple of arguments, of one function being a better match than another. It knows only "this function matches" or not. What the user observes is the following: when a call to a c++ function is made from python e.g.
m.f(1)
boost.python looks through the overloads for f, in the reverse order of registration, and calls the first one that is callable. For instance if the registrations are
void f_int(int); void f_bool(bool);
BOOST_PYTHON_MODULE(m) { def("f", f_int); def("f", f_bool); }
then f_bool will execute this call of f(1), since the python 'int' converts to c++ bool. If the overloads were registered in the reverse order, f_int would be executed.
In this case the c++ overloads in question are potentially distinguishable from one another, as python has distinct types bool and int. So these aren't "necessarily" ambiguous.
1.b) Problem: "Necessarily" ambiguous overloads
There are some overload sets that *are* necessarily ambiguous. This set, for instance,
void f_double(double); void f_float(float);
BOOST_PYTHON_MODULE(m) { def("f", f_double); def("f", f_float); }
can never be unambiguous, since python has no 'float' type. I.e.
f(x)
will call f_float for any value of x where the call succeeds at all.
1.c) Problem: Multiple values for keyword argument
At the same place in the boost.python code where this 'first-match' overload resolution is done, the user error 'multiple values for keyword argument' is not checked. Neal Becker recently pointed this out. With boost.python it looks like this:
int addem(int x, int y, int z) { return x*100 + y*10 + z; }
BOOST_PYTHON_MODULE(M) { def("addem", &addem, (arg("x"), arg("y"), arg("z"))); }
from M import addem addem(1, 8, 2, x=4) Traceback (most recent call last): ... ArgumentError: Python argument types in M.addem(int, int, int) did not match C++ signature: addem(int x, int y, int z)
That error message is very confusing... f(int,int,int) doesn't match f(int,int,int)? The pure python version reports something more sensible:
def g(x,y,z, **kwargs): ... print x,y,z,kwargs ... g(1,2,3,x=4) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: g() got multiple values for keyword argument 'x'
2.) An implemented solution
I've got something implemented. Here's what it does.
2.a) Solution: multiple values for keyword
The easiest case to catch is the last [1.c]. It is also orthogonal to the others. If a registered function has keyword arguments, check for multiple keyword values, and raise an error if so:
from M import addem addem(1,2,3,x=1) Boost.Python.TypeError: M.addem() got multiple values for keyword argument 'x'
2.b) Solution: "necessarily" ambiguous overloads
The next easiest thing to catch is case [1.b], "necessarily" ambiguous registrations. It proceeds as follows: at import time, as each new overload V for function F is registered, compare V to the existing overloads EO for F. If V has the same signature as something in EO, raise an AmbiguousOverload exception. For instance, if you load the module from [1.b] above, you get:
import m Traceback (most recent call last): ... AmbiguousOverload: Boost.Python function m.f has ambiguous overloads. C++ signatures f(float) -> None and f(double) -> None are indistinguishable to python.
Again this is because c++ float and c++ double both map to python 'float'. This one is "easy" as it happens only once (not at every function call) and doesn't take up any space.
2.c) Solution: ambiguous calls
The hard case is [1.a]:
void f_bool(bool); void f_int(int);
BOOST_PYTHON_MODULE(m) { def(f, f_bool); def(f, f_int); }
For module 'm' above, a call to f(True) or f(1) should succeed and call the corresponding function. Passing a float, however, is ambiguous:
f(True) # ok f(1) # ok f(1.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function m.f C++ signatures: f(int) f(bool)
So the implementation has some how 'scored' each possible overload and in each case either used the best one or reported ambiguity if multiple overloads are tied for 'best'.
The scoring works as follows.
In the guts of boost.python we recieve a tuple of arguments of type PyObject*. We also have a list of overloads, each with an mpl vector representing the signature of the associated C++ function. This is unchanged from the released version.
What has been done until now is to attempt to call each overload in order and return the result of the first that succeeds.
In this new implmentation, for each c++ argument of type T_n, we ask a class overload_score<T_n> to score the conversion of the type (PyTypeObject*) of this PyObject* to T_n. Example:
template <> struct overload_score<int> { boost::optional<unsigned> operator()(PyObject* type) { if(PyInt_CheckExact(type)) return boost::optional<unsigned>(0); // int == perfect else if (PyBool_Check(type)) return boost::optional<unsigned>(1); // bool == okay else if (PyFloat_CheckExact(type)) return boost::optional<unsigned>(1); // float == okay else if(arg_from_python<int>(type).convertible()) return boost::optional<unsigned>(1); // fallback else return boost::optional<unsigned>(); // unsuitable } };
The "score" type is optional<unsigned>. A valueless, (default-constructed) optional<unsigned> means 'unsuitable'. optional<unsigned>(0) is a perfect match, and optional<unsigned>(value) for value > 0 is a workable but less than perfect match.
These per-argument scores are added together for all arguments: this is the overload's total score. If any argument is 'unsuitable', the total score is 'unsuitable'.
If there is a tie for the best (lowest) score, the call is ambiguous and a Boost.Python.AmbiguousCall exception is raised. If there is only one function in first place, call it.
3.) Implications
This breaks a few corner cases that I've found in the tests.
3.a) implied init<> registration
I have found a few instances like this one:
struct X { X(); X(int); };
class_<X>("X") // #1 .def(init<>()) // #2 .def(init<int>()) ;
Here, #1 causes a default constructor to be registered, as does #2. This will cause a throw at load time as in [2.b]. It is simple to fix.
3.b) raw_constructor
The test suite for raw_constructor depends on our first_match overload resolution. The fix is to move a few things around to get the same effect without relying on this behavior.
The "before" code looks like this:
class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {}
tuple args; dict kw; };
object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__init__")(rest, kw); }
BOOST_PYTHON_MODULE(raw_ctor_ext) { // using no_init postpones defining __init__ function until after // raw_function for proper overload resolution order, since later // defs get higher priority. class_<Foo>("Foo", no_init) .def("__init__", raw_function(&init_foo)) .def(init<tuple, dict>()) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ; }
The "after" code does the registration piecemeal: Maybe there is a better way to do this. To my mind this is a little better because it makes explicit what is happening rather than relying on some subtle property of overload resolution:
class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {}
tuple args; dict kw; };
// our custom factory function object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__real_init__")(rest, kw); }
BOOST_PYTHON_MODULE(raw_ctor_ext) { // to get the desired effect we register // the actual constructor and our 'raw' constructor, // and then rename them class_<Foo> c("Foo", init<tuple, dict>()); c .def("__tmp_init__", raw_function(&init_foo)) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ;
// // __init__ => __real_init__ // __tmp_init__ => __init__ // object real_constructor = getattr(c, "__init__"); object raw_constructor = getattr(c, "__tmp_init__");
setattr(c, "__init__", raw_constructor); delattr(c, "__tmp_init__"); setattr(c, "__real_init__", real_constructor);
}
And that basically covers it. Looking forward to comments/feedback.
The code is here:
http://gitorious.org/~straszheim/boost/straszheims-python/trees/master
Note that the headers are in subdirectory 'include'... If there is enough response to this mail it I'll send out instructions on how to get a working build going.
- t _______________________________________________ Cplusplus-sig mailing list Cplusplus-sig@python.org http://mail.python.org/mailman/listinfo/cplusplus-sig
-- Gustavo J. A. M. Carneiro INESC Porto, Telecommunications and Multimedia Unit "The universe is always one step beyond logic." -- Frank Herbert
Gustavo Carneiro <gjcarneiro@gmail.com> writes:
Don't you think that when these overloading problems become an issue it is a sign of a poorly designed API?
In this case the intention is to fix bugs in boost.python. The broken example we've been working with is where a parameter is bool in one overload and int in another. Does this indicate a broken API? That's above my pay grade... I'm willing to say "probably not in all cases".
I mean, if overloaded functions parameters are not completely different in type or number, then maybe they are already too confusing to use and should not be overloaded in the first place?...
Maybe yes, though to my mind the real issue is an aesthetic one: should one at all attempt to emulate overloading in a language that doesn't have it natively? A similar discussion ensues w.r.t. "const". I wasn't around when the feature was added, which was ~7 years ago; to remove overloading now would break lots of code, and that decides it. So I'd phrase it like this: let's fix what's there so long as the cost in runtime and complexity is sufficiently small. -t
When it's ready, I'd like to try it on my code collection (see that at least nothing breaks).
participants (6)
-
Dane Springmeyer -
Gustavo Carneiro -
Neal Becker -
Renato Araujo -
Troy D. Straszheim -
troy@resophonic.com