
Hi, I thought a bit more about how to generate the low-level translation of RPython code. It might indeed be possible to do it by rewriting the flow graph until it contains mostly lower-level operations, and then translating it to C code straightforwardly, if we insert an intermediate optimization phase between these two phases. Here are some more details. I'm not sure in which order to present the ideas. I apologize for the lengthy e-mail, it's still all a bit too early to write down as a formal ReST documentation... ===Motivation=== See for example how W_ListObject is implemented in the stdobjspace (which is RPython, too). It has got a size in 'ob_size' and list of items in 'ob_item'. The naive implementation of a list in RPython is with a PyListObject*, i.e. as a pointer to a structure containing (1) the length and (2) another pointer to the actual items. If we do it this way, then W_ListObject will end up being implemented badly: struct W_BadListObject { ...header and refcount... int ob_size; // for the ob_size field PySomeKindOfListObject* ob_item; // for the ob_item field }; struct PySomeKindOfListObject { int len; PyObject** items; }; So it takes two indirections from a W_ListObject to its array of items, although it only takes one in CPython's own lists. Why? Because the 'ob_item' list inside W_ListObjects could, maybe, escape the W_ListObject instances and outlive them, or be modified from somewhere else. If you think about it this way then the extra indirection is needed. But by looking closely at the source of listobject.py (or its flow graph version), an automatic analysis can figure out that this particular 'ob_item' field never escapes the control of W_ListObject. Thus we don't need the first indirection in this case. The PySomeKindOfListObject can be *inlined* into the structure implementing W_ListObject: struct W_ListObject { ...type and refcount headers... int ob_size; PySomeKindOfListObject ob_item; // not a pointer any more! }; Now, it looks a lot like CPython's PyListObject structure, with 'ob_item.len' playing the role of the 'allocated' field of CPython. I believe that this kind of "structure-inlining" optimization is important. It is something that cannot easily be done in the current genc_* files. That's why I started thinking along the lines I will described below in more details. I like this idea because it also works with simpler types like integers, not just lists: we don't have to do all the type-juggling in genc_* any more; instead, we consider integers as PyIntObject*, and by the same inlining mecanisms replace them with PyIntObject only -- even better, PyIntObject without the type and refcount headers. What remains? The ob_ival field only. In other words by inlining a PyIntObject* declaration we get a structure with only one C "long". (From there it is easy to actually get rid of the struct and replace it with its single field.) ===In more details=== Let's say we start with a flow graph. For this example, let's consider the flow graph generated for the function 'def f(x): return x+1'. block1(x): add(x, 1) -> y goto return_block(y) The annotation code as it is today will infer that y is a SomeInteger() if we tell it that x is a SomeInteger(). Moreover the '1' is also a SomeInteger() with the attribute 'const' set to 1. What I propose is that we add a set of rewrite rules, which could be freely applied. The rules say essentially that whenever we see a given operation with the given annotations, it can be rewritten into one (or possibly several) other operations. For example, a long rule could be: method_extend(lst1: SomeList, lst2: SomeList) -> result: SomeList ----------------------------------------------------------------- len(lst2) -> c growlist(lst1, c) goto loop[i=0] loop: lt(i, c) -> cond switch cond: case False: goto sequel case True: goto body body: getitem(lst2, i) -> x fastappend(lst1, x) add(i, 1) -> i1 goto loop[i=i1] sequel: Maybe rules should be put in a file not in Python syntax, and parsed by the rule-applying code. I'm not sure if the above pseudo-syntax would do, but maybe something better along these lines would work. Anyway, there would also be simpler rules like: add(int1: SomeInteger, int2: SomeInteger) -> result: SomeInteger ---------------------------------------------------------------- add_i(int1.ob_ival, int2.ob_ival) -> result.ob_ival Note the reference to the field ob_ival of PyIntObjects. The above line says that the operation 'add' (which is PyNumber_Add()), although fine for any kind of objects, could be optimized if we knew that all three involved objects follow the "structure" of SomeInteger. In this case we can perform it using the new operation 'add_i' on the ob_ival fields of the structures. Once more the intended meaning of the rule is: the operation above the line is perfectly valid on its own -- it would produce a call to PyNumber_Add() in C -- but for efficiency it can be replaced with the operation below the line. The idea is thus to start from the basic C code generator that just writes PyNumber_Add() and similar calls for any operation, and gradually move to "inlined" operation like add_i which correspond just to "+" in C. If we only do that, the C code would still manipulate heap-built objects only, i.e. PyIntObjects; we would just have replaced a call to PyNumber_Add() with inlined code like: result = ...create the PyIntObject...; ((PyIntObject*) result)->ob_ival = ((PyIntObject*) int1)->ob_ival + ((PyIntObject*) int2)->ob_ival; The idea is then to use structure inlining to detect and get rid of the PyIntObject (and other heap structures). This requires some kind of whole-program analysis. This analysis be done as an intermediate phase, between the rule-rewriting and the C generation phase. It would look at how each field of each object is used, and based on this deduce how each structure is best implemented. What I'm thinking about is something like this: * look if an object's structure is mutable or not, i.e. if we ever write to its fields. (This should distinguish initialization from later rewrites.) * look if an object is shared or not: if references to an object don't escape too far into various parts of the program then we don't need to refcount it. More precisely, we might be able to assign to the object a "parent" object, a container which is guaranteed to exist at least as long as its child. Thus the "parent" has got the only reference (no refcount needed) and others can borrow it. The parent can be the frame of a function, too, for objects that never outlive the function in which they were created. The point is that both non-shared and shared but immutable objects can be inlined into their parent, so that we can get rid of the heap allocation and the access indirection. For example, as PyIntObjects are all immutable, they can all be inlined: any "PyObject*" field or variable known to point to a PyIntObject can be replaced with a headerless PyIntObject structure in-place, or even directly with its single "long" field. For the above example we'd get code like this: result.ob_ival = int1.ob_ival + int2.ob_ival; Or after inlining the single field: result__ob_ival = int1__ob_ival + int2__ob_ival; Apart from the long variable names (ugh! __ob_ival after each RPython variable containing an integer :-), this is good ! Sorry for the long messages, I hope I could get some of my motivations through. Armin

PS. I realize my previous e-mail carried a high risk of confusion. In there, W_ListObject was used as an example of RPython code; it was just an example, unrelated to the fact that the example discusses how to implement lists like the W_ListObject.ob_item field. I could have chosen W_TupleObject or W_DictObject as the example, and it would still have been about how to implement their fields that are RPython lists (e.g. W_TupleObject.wrappeditems or W_DictObject.data). In the particular example of W_ListObject, we would like (it is our goal) that the W_ListObject class be translated into a C structure with three fields that look like "size", "allocated" and "items_ptr". We want this because it is the best C-ish version of the W_ListObject class, among less efficient variants incurring more indirections; CPython uses this most efficient variant too in its C structure called PyListObject. This is not to be confused with lists as understood by RPython. The field W_ListObject.ob_item is such a list. They are marked with a SomeList() annotation. The RPython-to-C translator needs to know precisely what such lists are, and how to turn them into C code. We decided earlier that such lists would be translated as some sort of simple C array with no over-allocation, so that they look like a pointer to a structure with a "length" and an "items_ptr" field. This, the translator knows. It is so with any usage of lists in RPython code, not just in W_ListObject. So the goal in the example was how to have the translator automatically turn a class like W_ListObject into a C structure that would have fields that look like "size", "allocated" and "items_ptr", and doing so using the knowledge that W_ListObject has two fields, ob_size and ob_item, with annotations SomeInteger() and SomeList() respectively. A bientôt, Armin.

PS. I realize my previous e-mail carried a high risk of confusion. In there, W_ListObject was used as an example of RPython code; it was just an example, unrelated to the fact that the example discusses how to implement lists like the W_ListObject.ob_item field. I could have chosen W_TupleObject or W_DictObject as the example, and it would still have been about how to implement their fields that are RPython lists (e.g. W_TupleObject.wrappeditems or W_DictObject.data). In the particular example of W_ListObject, we would like (it is our goal) that the W_ListObject class be translated into a C structure with three fields that look like "size", "allocated" and "items_ptr". We want this because it is the best C-ish version of the W_ListObject class, among less efficient variants incurring more indirections; CPython uses this most efficient variant too in its C structure called PyListObject. This is not to be confused with lists as understood by RPython. The field W_ListObject.ob_item is such a list. They are marked with a SomeList() annotation. The RPython-to-C translator needs to know precisely what such lists are, and how to turn them into C code. We decided earlier that such lists would be translated as some sort of simple C array with no over-allocation, so that they look like a pointer to a structure with a "length" and an "items_ptr" field. This, the translator knows. It is so with any usage of lists in RPython code, not just in W_ListObject. So the goal in the example was how to have the translator automatically turn a class like W_ListObject into a C structure that would have fields that look like "size", "allocated" and "items_ptr", and doing so using the knowledge that W_ListObject has two fields, ob_size and ob_item, with annotations SomeInteger() and SomeList() respectively. A bientôt, Armin.
participants (1)
-
Armin Rigo