Yet another trace tool!

Hi PyPy-ers, After seeing Armin's flow viewer I got the motivation to look at translation. What was suppose to be gencpp.py turned into another trace tool. ;-) To run go to translator/tool and run flowtrace.py ! There is a debug flag to FlowTracer which will give you some idea what is actully being traced! Well - back to gencpp.py then. :-) Cheers, Richard PS I am wondering if we should create a top level hacks folder to save polluting our code base with lots of little noddy scripts? Or do I have free reign to check in what and where I like? :-)

Hello Richard, On Tue, Aug 24, 2004 at 12:12:51AM +0100, Richard Emslie wrote:
To run go to translator/tool and run flowtrace.py ! There is a debug flag to FlowTracer which will give you some idea what is actully being traced!
It uses the earthenware package, about which I can't seem to find more info with Google...
Well - back to gencpp.py then. :-)
I am still struggling to find a reasonably clean way to have genc.py emit typed C code. I was wondering about using C++'s overloaded functions: gencpp.py could write an operation as a call like z=op_add(x,y), which the C++ compiler would resolve into a call to one of the predefined overloaded implementations of op_add(). The compiler would also insert automatic conversions with some C++ class trickery. I'm a bit hesitant to go into that direction for two reasons: we could get C++ compile errors in the presence of ambiguities (i.e. when two overloaded functions could equally well be used for a call); and gcc generates suboptimal code when we use structs or classes with just one or two fields, like this one: class PyObjectRef { PyObject* p; public: explicit PyObjectRef(PyObject* np) { p = np; } // constructor, consumes // reference ~PyObjectRef() { Py_DECREF(p); } // destructor PyObjectRef(int n) { p = PyInt_FromLong(n); } // conversion constructor }; int op_add(int a, int b) { return a+b; } PyObjectRef op_add(PyObjectRef &a, PyObjectRef &b) { return PyObjectRef(PyNumber_Add(a.p, b.p)); } The conversion constructor from int to PyObjectRef is applied automatically by the compiler, so it is quite nice, but you get a program that handles pointers to PyObjectRefs (which are pointers to PyObject*, i.e. PyObject**) instead of just PyObject*. There might be a way to fix that, but that starts to be hackish. Maybe Boost.Python already solves all these problems? Well, is it a good idea to -- again -- target a high-level language and then have to work around some of its concepts when they are not fully suited to our low-level needs...? Armin

Hi Armin, Sorry I was meaning to get back to your last mail about c++ pointers. Was going to flesh out gencpp.py a bit this weekend. On Fri, 27 Aug 2004, Armin Rigo wrote:
Oooops. They can be commented out. It is an opensource package from my company (I'm still there). It hasn't got full clearance to be released and I didn't mean to include it in the code :-(. I'll fix it tonight. Also importing pygame is not happy since there is a local path in working directory called pygame. Anyways it is a throw away hack bit of code. I wasn't sure I should check it in, but incase the tracing is useful, and it is easy to remove from svn, I did. I am just playing, trying to get an understanding for the flow/annotation code. I was hoping to get it to a stage where I could point it to an imported module and it could convert it (and then maybe try intergrate with traceinteractive - although I've no idea how to convert app_xxx() space functions - I started hacking the FlowObjSpace <bad idea>, but I hoping you'd share you thoughts on that one???? :-))
Yes - that was the plan - although I wasn't too sure how it would pan out (and I was going to mention it when I wrote to your last mail - but I assumed you'd already thought of it :-)). The first step was to remove the macros from gencpp.h into python and generate a load of overloaded functions. I don't understand the annotation code so I haven't much idea what this involves yet. Also was thinking we could create a number of overloaded functors and create a table of them. We could then write the equivalent of flowtrace.py as a c++ entension and attempt to use c++'s dynamic dispatch (I'm not confident that c++ would up to the job, and it is likely to be pretty slow - but fun to try nevertheless.) The compiler would also insert automatic
Previous experience has shown that if the types are c++ classes/structs then it does the job. But I do agree with your point.
I've attached what I have so far with the reference counted pointer (I am happy with phrase hackish here :-)) . It was hacked in the early hours on Sunday night - and I haven't used it in anger yet - so it might be riddled with bugs / flaws. In terms of suboptimal code - well it is already suboptimal because it is c/c++ on top of cpython. ;-) Another thought is that if can do a full translation of standard object space and then start removing some of the overloaded functions which have reference to CPython. Then the c++ compiler can tell us if they have been annotated out or not. Incrementally we could remove our reliance on cpython. (There is fair chance that doesn't make any sense)
Maybe Boost.Python already solves all these problems?
I've no idea!
The really nice thing about the pypy translation is that these are complete end point in that a genxxx.py has no dependents at all. Even nicer is that we can do several independent translations with various levels of not so smart ideas (like everything I do!) - and then combine the better ones at the end. Well chances are I'll be hacking some more this weekend - and be around on irc. We could try to combine forces if you are interested? Cheers, Richard

Hello Richard, On Fri, Aug 27, 2004 at 12:27:34PM +0100, Richard Emslie wrote:
... directory called pygame.
Argh. Renaming time...
Er, I think something had been done about these app-level helpers in FlowObjSpace -- or was it just a plan never implemented?
Actually not :-)
Previous experience has shown that if the types are c++ classes/structs then it does the job. But I do agree with your point.
I see. If the things include primitive types then it gets more messy. Attached is an example where the call to op_add() is deemed ambiguous by the compiler, and I don't see how to solve that problem -- unless we don't use ints at all as the type of the variables, but only a custom class Integer containing an int field, as shown in the 2nd attachement. Sadly, this is elegant and fine with good C++ optimizers (e.g. Microsoft's) but g++ really produces bad code in this kind of situation. As you said, we should still try to do it this way, because it's not so involved and quite independent from other efforts like generating C code with conversions in the source. In your gencpp.h, what is the purpose of CPythonProxyObject and its own reference counter? It seems that class Handle could just use a PyObject* directly and the Py_INCREF/DECREF() macros to manipulate its refcount. I'll not be too much on IRC for the next 10 days or so, but I'll try to show up on Sunday. A bientot, Armin.

Hi Armin, On Fri, 27 Aug 2004, Armin Rigo wrote: ...
Ahhh, I misunderstood. I was thinking along far more tedious lines of creating every possible combination of arguments and not doing any automatic conversion at all. inline Ref op_add(const BorrowedRef &x, const BorrowedRef &y) ... inline Ref op_add(int x, int y) ... inline Ref op_add(int x, const BorrowedRef &y) ... inline Ref op_add(const BorrowedRef &x, int y) ... // Same again but returning ints but the problem IIRC c++ doesn't support overloaded return results. Maybe we create a hierarchical tree of refs (including custom classes for int, float, etc like you suggested) and return the base type for each op (I'm confused with lack of c++ knowhow, need to hack some!). I imagine this shifts from compile time to run time (which kind of defeats the purpose of what we are doing - but not too much).
Probably because I hacked some old templated code - but I think you are right since PyObject* is already ref counted. :-) Assignment to self is always interesting (since you decrement, which collects and then increment which then is on an invalid object - but still don't need layer of indirection of CPythonProxyObject). Also I thought it was good idea not to call Py_INCREF and PyDECREF too often - but that is plain silly. It does make nulls easier though (but not much).
I'll not be too much on IRC for the next 10 days or so, but I'll try to show up on Sunday.
Ok. Can you drop us a copy of genc.py which was doing something with the annotations if that is ok (doesn't need to be any nice state)? Thanks. Cheers, Richard

Hello Richard, On Fri, Aug 27, 2004 at 04:12:14PM +0100, Richard Emslie wrote:
Right, hence the idea to use the compile-time overloading and automatic conversions. op_add() would be defined only for two Ints returning an Int and for two Refs returning a Ref. Then if something else like "two Ints returning a Ref" is needed, the C++ compiler will use the (Int, Int) version and convert the result from Int to Ref automatically. If something unsatisfiable is needed, like (Ref, Ref -> Int), then it means that there is a problem with the annotation phase.
Ok. Can you drop us a copy of genc.py which was doing something with the annotations if that is ok (doesn't need to be any nice state)? Thanks.
I should have saved some intermediate results, but I wasn't on Internet so I couldn't just check in my progress and now it's really in the middle of a rewrite... Give me some more time... Armin

Hi Richard, On Fri, Aug 27, 2004 at 04:12:14PM +0100, Richard Emslie wrote:
Ok. Can you drop us a copy of genc.py which was doing something with the annotations if that is ok (doesn't need to be any nice state)? Thanks.
Done. Attached to this e-mail is the small test script I use... Oups, I forgot to CC all my previous answers to the mailing list. Sorry, I am going to resend them now... Armin

Hello PyPy-ers, Sorry, I forgot to CC to the list when I first replied to Richard, and so we didn't notice that the whole subsequent thread wasn't mirrored to pypy-dev... I'm bouncing the messages. Armin

Hello Richard, On Tue, Aug 24, 2004 at 12:12:51AM +0100, Richard Emslie wrote:
To run go to translator/tool and run flowtrace.py ! There is a debug flag to FlowTracer which will give you some idea what is actully being traced!
It uses the earthenware package, about which I can't seem to find more info with Google...
Well - back to gencpp.py then. :-)
I am still struggling to find a reasonably clean way to have genc.py emit typed C code. I was wondering about using C++'s overloaded functions: gencpp.py could write an operation as a call like z=op_add(x,y), which the C++ compiler would resolve into a call to one of the predefined overloaded implementations of op_add(). The compiler would also insert automatic conversions with some C++ class trickery. I'm a bit hesitant to go into that direction for two reasons: we could get C++ compile errors in the presence of ambiguities (i.e. when two overloaded functions could equally well be used for a call); and gcc generates suboptimal code when we use structs or classes with just one or two fields, like this one: class PyObjectRef { PyObject* p; public: explicit PyObjectRef(PyObject* np) { p = np; } // constructor, consumes // reference ~PyObjectRef() { Py_DECREF(p); } // destructor PyObjectRef(int n) { p = PyInt_FromLong(n); } // conversion constructor }; int op_add(int a, int b) { return a+b; } PyObjectRef op_add(PyObjectRef &a, PyObjectRef &b) { return PyObjectRef(PyNumber_Add(a.p, b.p)); } The conversion constructor from int to PyObjectRef is applied automatically by the compiler, so it is quite nice, but you get a program that handles pointers to PyObjectRefs (which are pointers to PyObject*, i.e. PyObject**) instead of just PyObject*. There might be a way to fix that, but that starts to be hackish. Maybe Boost.Python already solves all these problems? Well, is it a good idea to -- again -- target a high-level language and then have to work around some of its concepts when they are not fully suited to our low-level needs...? Armin

Hi Armin, Sorry I was meaning to get back to your last mail about c++ pointers. Was going to flesh out gencpp.py a bit this weekend. On Fri, 27 Aug 2004, Armin Rigo wrote:
Oooops. They can be commented out. It is an opensource package from my company (I'm still there). It hasn't got full clearance to be released and I didn't mean to include it in the code :-(. I'll fix it tonight. Also importing pygame is not happy since there is a local path in working directory called pygame. Anyways it is a throw away hack bit of code. I wasn't sure I should check it in, but incase the tracing is useful, and it is easy to remove from svn, I did. I am just playing, trying to get an understanding for the flow/annotation code. I was hoping to get it to a stage where I could point it to an imported module and it could convert it (and then maybe try intergrate with traceinteractive - although I've no idea how to convert app_xxx() space functions - I started hacking the FlowObjSpace <bad idea>, but I hoping you'd share you thoughts on that one???? :-))
Yes - that was the plan - although I wasn't too sure how it would pan out (and I was going to mention it when I wrote to your last mail - but I assumed you'd already thought of it :-)). The first step was to remove the macros from gencpp.h into python and generate a load of overloaded functions. I don't understand the annotation code so I haven't much idea what this involves yet. Also was thinking we could create a number of overloaded functors and create a table of them. We could then write the equivalent of flowtrace.py as a c++ entension and attempt to use c++'s dynamic dispatch (I'm not confident that c++ would up to the job, and it is likely to be pretty slow - but fun to try nevertheless.) The compiler would also insert automatic
Previous experience has shown that if the types are c++ classes/structs then it does the job. But I do agree with your point.
I've attached what I have so far with the reference counted pointer (I am happy with phrase hackish here :-)) . It was hacked in the early hours on Sunday night - and I haven't used it in anger yet - so it might be riddled with bugs / flaws. In terms of suboptimal code - well it is already suboptimal because it is c/c++ on top of cpython. ;-) Another thought is that if can do a full translation of standard object space and then start removing some of the overloaded functions which have reference to CPython. Then the c++ compiler can tell us if they have been annotated out or not. Incrementally we could remove our reliance on cpython. (There is fair chance that doesn't make any sense)
Maybe Boost.Python already solves all these problems?
I've no idea!
The really nice thing about the pypy translation is that these are complete end point in that a genxxx.py has no dependents at all. Even nicer is that we can do several independent translations with various levels of not so smart ideas (like everything I do!) - and then combine the better ones at the end. Well chances are I'll be hacking some more this weekend - and be around on irc. We could try to combine forces if you are interested? Cheers, Richard

Hello Richard, On Fri, Aug 27, 2004 at 12:27:34PM +0100, Richard Emslie wrote:
... directory called pygame.
Argh. Renaming time...
Er, I think something had been done about these app-level helpers in FlowObjSpace -- or was it just a plan never implemented?
Actually not :-)
Previous experience has shown that if the types are c++ classes/structs then it does the job. But I do agree with your point.
I see. If the things include primitive types then it gets more messy. Attached is an example where the call to op_add() is deemed ambiguous by the compiler, and I don't see how to solve that problem -- unless we don't use ints at all as the type of the variables, but only a custom class Integer containing an int field, as shown in the 2nd attachement. Sadly, this is elegant and fine with good C++ optimizers (e.g. Microsoft's) but g++ really produces bad code in this kind of situation. As you said, we should still try to do it this way, because it's not so involved and quite independent from other efforts like generating C code with conversions in the source. In your gencpp.h, what is the purpose of CPythonProxyObject and its own reference counter? It seems that class Handle could just use a PyObject* directly and the Py_INCREF/DECREF() macros to manipulate its refcount. I'll not be too much on IRC for the next 10 days or so, but I'll try to show up on Sunday. A bientot, Armin.

Hi Armin, On Fri, 27 Aug 2004, Armin Rigo wrote: ...
Ahhh, I misunderstood. I was thinking along far more tedious lines of creating every possible combination of arguments and not doing any automatic conversion at all. inline Ref op_add(const BorrowedRef &x, const BorrowedRef &y) ... inline Ref op_add(int x, int y) ... inline Ref op_add(int x, const BorrowedRef &y) ... inline Ref op_add(const BorrowedRef &x, int y) ... // Same again but returning ints but the problem IIRC c++ doesn't support overloaded return results. Maybe we create a hierarchical tree of refs (including custom classes for int, float, etc like you suggested) and return the base type for each op (I'm confused with lack of c++ knowhow, need to hack some!). I imagine this shifts from compile time to run time (which kind of defeats the purpose of what we are doing - but not too much).
Probably because I hacked some old templated code - but I think you are right since PyObject* is already ref counted. :-) Assignment to self is always interesting (since you decrement, which collects and then increment which then is on an invalid object - but still don't need layer of indirection of CPythonProxyObject). Also I thought it was good idea not to call Py_INCREF and PyDECREF too often - but that is plain silly. It does make nulls easier though (but not much).
I'll not be too much on IRC for the next 10 days or so, but I'll try to show up on Sunday.
Ok. Can you drop us a copy of genc.py which was doing something with the annotations if that is ok (doesn't need to be any nice state)? Thanks. Cheers, Richard

Hello Richard, On Fri, Aug 27, 2004 at 04:12:14PM +0100, Richard Emslie wrote:
Right, hence the idea to use the compile-time overloading and automatic conversions. op_add() would be defined only for two Ints returning an Int and for two Refs returning a Ref. Then if something else like "two Ints returning a Ref" is needed, the C++ compiler will use the (Int, Int) version and convert the result from Int to Ref automatically. If something unsatisfiable is needed, like (Ref, Ref -> Int), then it means that there is a problem with the annotation phase.
Ok. Can you drop us a copy of genc.py which was doing something with the annotations if that is ok (doesn't need to be any nice state)? Thanks.
I should have saved some intermediate results, but I wasn't on Internet so I couldn't just check in my progress and now it's really in the middle of a rewrite... Give me some more time... Armin

Hi Richard, On Fri, Aug 27, 2004 at 04:12:14PM +0100, Richard Emslie wrote:
Ok. Can you drop us a copy of genc.py which was doing something with the annotations if that is ok (doesn't need to be any nice state)? Thanks.
Done. Attached to this e-mail is the small test script I use... Oups, I forgot to CC all my previous answers to the mailing list. Sorry, I am going to resend them now... Armin

Hello PyPy-ers, Sorry, I forgot to CC to the list when I first replied to Richard, and so we didn't notice that the whole subsequent thread wasn't mirrored to pypy-dev... I'm bouncing the messages. Armin
participants (2)
-
Armin Rigo
-
Richard Emslie