
Hello everyone, First of all, I'll introduce myself. I've been working on a Python compiler written in OCaml for about a year now. Some of you might know me from my talk at Google. One of the list members suggested this mailing list as a good forum for discussion and bouncing ideas. I think that one issue to consider are the complex semantics of functions and method calls. Whenever a function call is made, there is a lot of checking involved as to 'where' the function is called from: i.e. whether it's a user-defined function/method, a built-in function/method, a class function/method or an instance function/ method. Just reading the reference manual section on 'function objects' is enough to see how many different cases are there. Of course, this provides tons of flexibility in reusing the same code for different ways. However, it increases the amount of book-keeping and pre-processing required for a call. For example, Consider an object that implements methods 'str' and 'append' (the list object, for instance). During retrieval of methods, the following default retrieval method is called (in OCaml syntax, a simplified version). let list_getattr o attr = let v = get o.dict attr in (* lookup the attribute in the dictionary *) match attr with | String "append" | ... | -> v.instance = o; v (* set the method's instance to o before returning it *) | _ -> v (* else return the method object as is *) So basically, I have to check for two cases based on the actual names of the methods being called. This seems like extra work for the runtime. As a suggestion, I think that adopting a single calling convention (either 'x.foo()' or 'foo(x)') would have 2 advantages. 1) It would make it easier for users and make for more readable code. I still find myself having to refer to the docs to find out how a method needs to be called. 2) It would improve performance by reducing these checks shown above to one case. What do y'all think? Would this sacrifice too much flexibility? If this topic has been discussed on this list before, please feel free to point me there. Thank you Raj

"Raj B" <rajb@rice.edu> wrote in message news:AAD8D5AB-96A1-4503-A90B-B037CAFA13EB@rice.edu... | Hello everyone, | | First of all, I'll introduce myself. I've been working on a Python | compiler written in OCaml for about a year now. Some of you might | know me from my talk at Google. One of the list members suggested | this mailing list as a good forum for discussion and bouncing ideas. | | I think that one issue to consider are the complex semantics of | functions and method calls. Whenever a function call is made, there | is a lot of checking involved as to 'where' the function is called | from: i.e. whether it's a user-defined function/method, a built-in | function/method, a class function/method or an instance function/ | method. Just reading the reference manual section on 'function | objects' is enough to see how many different cases are there. | | Of course, this provides tons of flexibility in reusing the same code | for different ways. However, it increases the amount of book-keeping | and pre-processing required for a call. For example, Consider an | object that implements methods 'str' and 'append' (the list object, | for instance). During retrieval of methods, the following default | retrieval method is called (in OCaml syntax, a simplified version). | | let list_getattr o attr = | let v = get o.dict attr in (* lookup the attribute in the dictionary *) | | match attr with | | String "append" | ... | -> | v.instance = o; v (* set the method's instance to o | before returning it *) | | | _ -> v (* else return the method object as is *) I cannot understand above, even 'simplified'. | So basically, I have to check for two cases based on the actual names | of the methods being called. This seems like extra work for the runtime. | | As a suggestion, I think that adopting a single calling convention | (either 'x.foo()' or 'foo(x)') would have 2 advantages. If foo is an instance method, the latter has to be x.__class__.foo(x). There is no way to disallow the latter; but the former is much easier. | 1) It would make it easier for users and make for more readable code. | I still find myself having to refer to the docs to find out how a | method needs to be called. I do not understand 'needs'. Any instance method can be called either way, with the first much easier. Knowing about the second is mainly needed to understand why instance methods all have a self parameter. | 2) It would improve performance by reducing these checks shown above | to one case. tjr

Raj B <rajb@rice.edu> wrote:
foo(x) means one thing. x.foo() means another. Trying to merge them for the sake of "consistency" will only confuse everyone who has been using method calling (x.foo()) to call methods on an object (making every function call into a generic function that users don't know if it is really just a generic function, or a method caller), or will give all objects in which you want to call the function foo() on have a method x.foo() (making every method call indistinguishable as to whether you are calling the function foo with argument x, or method foo on object x). While it may make it easier for you to implement in OCaml, it makes no sense in terms of Python sementics that are understood by basically everyone for at least 10 years. - Josiah

Well, this definitely makes no difference to me to implement in OCaml or any other language. I'm implementing exactly how Python behaves, so whether it's easy or difficult is not really the question. Consider the list object (listobject.c). In CPython, the 'repr' method for integers is implemented by the 'list_repr' function and the 'insert operation is implemented by 'listinsert', which in turn is part of the 'list_as_sequence' set of methods. So it both 'repr' and 'insert' are implemented as 'methods'. However, the Python convention is that 'repr' is called as repr(l) whereas insert would be l.insert(i,v), with l passed implicitly. In both cases, they are performing almost the same amount of work in looking up a function in a PyObject structure (with two lookups for 'repr' and three for 'insert'). Obviously there is a design/ implementation/philosophical decision that one is a generic function and the other is a method call. However, the implementation has to do 2 different things based on the name of the function to be called, even though they are both part of the same object structure. I was wondering why it was necessary to do it that way. Thanks Raj On May 14, 2007, at 1:53 PM, Josiah Carlson wrote:
-- When you are not cute, you've got to be clever -- David Sedaris

Please don't top post, it unnecessarily removes context for replies that help in understanding (most people read top to bottom, not bottom to top). Raj B <rajb@rice.edu> wrote:
(repr(5) is implemented as int_repr() in intobject.c)
You are not going to change the oft-discussed, long-time established __magic__ method and magic() builtins. That is to say, certain builtin functions (and some operations) result in a method invocation of obj.__magic__() (sometimes with additional arguments). This is not going to change. Please see the various Python 3.0 PEPs. Among those are len(), repr(), etc., as well as functions listed in operator module. Why? Generally, finding the length of an object has been defined to be discovered by len(obj). While it is internally translated into obj.__len__(), this is to allow for user methods to take on names that could otherwise clash with standard methods.
Convenience, flexibility, established semantics, etc. To change it would be a major language overhaul for little (if any, if not negative) Python user gains. - Josiah

"Raj B" <rajb@rice.edu> wrote in message news:AAD8D5AB-96A1-4503-A90B-B037CAFA13EB@rice.edu... | Hello everyone, | | First of all, I'll introduce myself. I've been working on a Python | compiler written in OCaml for about a year now. Some of you might | know me from my talk at Google. One of the list members suggested | this mailing list as a good forum for discussion and bouncing ideas. | | I think that one issue to consider are the complex semantics of | functions and method calls. Whenever a function call is made, there | is a lot of checking involved as to 'where' the function is called | from: i.e. whether it's a user-defined function/method, a built-in | function/method, a class function/method or an instance function/ | method. Just reading the reference manual section on 'function | objects' is enough to see how many different cases are there. | | Of course, this provides tons of flexibility in reusing the same code | for different ways. However, it increases the amount of book-keeping | and pre-processing required for a call. For example, Consider an | object that implements methods 'str' and 'append' (the list object, | for instance). During retrieval of methods, the following default | retrieval method is called (in OCaml syntax, a simplified version). | | let list_getattr o attr = | let v = get o.dict attr in (* lookup the attribute in the dictionary *) | | match attr with | | String "append" | ... | -> | v.instance = o; v (* set the method's instance to o | before returning it *) | | | _ -> v (* else return the method object as is *) I cannot understand above, even 'simplified'. | So basically, I have to check for two cases based on the actual names | of the methods being called. This seems like extra work for the runtime. | | As a suggestion, I think that adopting a single calling convention | (either 'x.foo()' or 'foo(x)') would have 2 advantages. If foo is an instance method, the latter has to be x.__class__.foo(x). There is no way to disallow the latter; but the former is much easier. | 1) It would make it easier for users and make for more readable code. | I still find myself having to refer to the docs to find out how a | method needs to be called. I do not understand 'needs'. Any instance method can be called either way, with the first much easier. Knowing about the second is mainly needed to understand why instance methods all have a self parameter. | 2) It would improve performance by reducing these checks shown above | to one case. tjr

Raj B <rajb@rice.edu> wrote:
foo(x) means one thing. x.foo() means another. Trying to merge them for the sake of "consistency" will only confuse everyone who has been using method calling (x.foo()) to call methods on an object (making every function call into a generic function that users don't know if it is really just a generic function, or a method caller), or will give all objects in which you want to call the function foo() on have a method x.foo() (making every method call indistinguishable as to whether you are calling the function foo with argument x, or method foo on object x). While it may make it easier for you to implement in OCaml, it makes no sense in terms of Python sementics that are understood by basically everyone for at least 10 years. - Josiah

Well, this definitely makes no difference to me to implement in OCaml or any other language. I'm implementing exactly how Python behaves, so whether it's easy or difficult is not really the question. Consider the list object (listobject.c). In CPython, the 'repr' method for integers is implemented by the 'list_repr' function and the 'insert operation is implemented by 'listinsert', which in turn is part of the 'list_as_sequence' set of methods. So it both 'repr' and 'insert' are implemented as 'methods'. However, the Python convention is that 'repr' is called as repr(l) whereas insert would be l.insert(i,v), with l passed implicitly. In both cases, they are performing almost the same amount of work in looking up a function in a PyObject structure (with two lookups for 'repr' and three for 'insert'). Obviously there is a design/ implementation/philosophical decision that one is a generic function and the other is a method call. However, the implementation has to do 2 different things based on the name of the function to be called, even though they are both part of the same object structure. I was wondering why it was necessary to do it that way. Thanks Raj On May 14, 2007, at 1:53 PM, Josiah Carlson wrote:
-- When you are not cute, you've got to be clever -- David Sedaris

Please don't top post, it unnecessarily removes context for replies that help in understanding (most people read top to bottom, not bottom to top). Raj B <rajb@rice.edu> wrote:
(repr(5) is implemented as int_repr() in intobject.c)
You are not going to change the oft-discussed, long-time established __magic__ method and magic() builtins. That is to say, certain builtin functions (and some operations) result in a method invocation of obj.__magic__() (sometimes with additional arguments). This is not going to change. Please see the various Python 3.0 PEPs. Among those are len(), repr(), etc., as well as functions listed in operator module. Why? Generally, finding the length of an object has been defined to be discovered by len(obj). While it is internally translated into obj.__len__(), this is to allow for user methods to take on names that could otherwise clash with standard methods.
Convenience, flexibility, established semantics, etc. To change it would be a major language overhaul for little (if any, if not negative) Python user gains. - Josiah
participants (3)
-
Josiah Carlson
-
Raj B
-
Terry Reedy