Paul Rubin wrote:
Christian Tismer <tismer@tismer.com> writes:
This leads to high-level descriptions of low-level fields of all structures. [...]
Do you have some other examples that really get used often? I'd think f_lineno's don't get used that often.
This was just an example for a common field (in fact it is used quite often at runtime, but almost only in the SET_LINENO opcode), that is embedded into some other structure. Everything else would to. The idea is to describe fixed data structures in a way that allows Python to easily deduce what the data type is, without any trial. See for example Thomas Heller's ctypes module, which has such descriptions for everything. A drawback of this is that we always need to access target variables in dotted notation. Otherwise we loose the static type info. If Python had type declarations, this would be no problem. Here an example from intobject.c: static int int_compare(PyIntObject *v, PyIntObject *w) { register long i = v->ob_ival; register long j = w->ob_ival; return (i < j) ? -1 : (i > j) ? 1 : 0; } Assuming that we try to model this in Python, the resulting code might look like this def int_compare(v, w): i, j = v.ob_ival, w.ob_ival if i < j: return -1 elif i > j: return 1 return 0 The above code only occours in contexts where integer objects are passed in. There is no type info in advance, but at the first execution of this code, v and w are passed in with their descriptions of all fields, and it is now absolutely clear that i and j are values of fixed sized integers. Code for directly accessing the ob_ival fields and doing the comparison can immediately be emitted when running the code first time. A remaining problem with the lack of declarations are local variables which are not members of a structure and it is not clear from the beginning what the primitive type should be. One ugly way would be to construct a structure for the local variables and to use dotted notation again. I hope this can be avoided by propagation of type info via __coerce__. Another snipped from intobject: for (i = 0, p = &list->objects[0]; i < N_INTOBJECTS; i++, p++) { if (PyInt_CheckExact(p) && p->ob_refcnt != 0) irem++; } Given a type object named c_int, this might translate to i = c_integer(0) p = list.objects while i < N_INTOBJECTS: # body not implemented here i += 1 p += 1 Here I use a known class as initialization of i. The data type is therefore known. p as a pointer field in the list structure is also known. The __coerce__ method of these classes can be written in a way, that they always propagate their own class to other operands, and especially in this case, the right operand is a constant. Given a definition like this: class c_integer(c_datatypes): def __coerce__(self, other): if type(other) == int: return self, c_integer(other) elif .... What I tried to express is that with little or no help of the programmer, primitive data types can be deduced quite easily, and unique code can be emitted on first execution of the code.
Will you give some thought to going to a tagged representation of object references, and maybe a more traditional GC? That way you avoid a lot of memory traffic, and also don't have to adjust reference counts every time you touch anything.
I think these would give a big performance boost, and also would also simplify the C API. It might be possible to supply a compatibility layer to make the old C API still useable.
Can you give me a hint how this would be done? I have no experience with tagged representations, but if this can save so much, I should learn more about it. cheers - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
From: "Christian Tismer" <tismer@tismer.com>
Will you give some thought to going to a tagged representation of object references, and maybe a more traditional GC? That way you avoid a lot of memory traffic, and also don't have to adjust reference counts every time you touch anything.
I think these would give a big performance boost, and also would also simplify the C API. It might be possible to supply a compatibility layer to make the old C API still useable.
Can you give me a hint how this would be done? I have no experience with tagged representations, but if this can save so much, I should learn more about it.
this is a relevant survey paper: http://citeseer.nj.nec.com/gudeman93representing.html Representing Type Information in Dynamically Typed Languages e.g. integer in the range [-2**30,2**30-1] would be represented by their bit patterns shifted left by 1 bit all other objects would be boxed, and the address left shifted by 1 and ORed with 1. The fact that Python is slowly removing the long/integer separation can make less disruptive such an approach. OTOH I think such an approach would make the life a bit more complicated for psyco when detecting that a datum has machine-size integer type, because such a datum would be either in the first form _or_ a boxed (long) integer in the larger range -2**31 2**31-1. Another approach is to carry data around as a struct where both the tag and the datum are machine words, this approach is AFAIK currently used in the Gwydion Dylan compiler (www.gwydiondylan.org) originally developed at CMU. regards.
"Samuele Pedroni" <pedronis@bluewin.ch> writes:
integer in the range [-2**30,2**30-1] would be represented by their bit patterns shifted left by 1 bit
all other objects would be boxed, and the address left shifted by 1 and ORed with 1.
I seem to recall someone tried that with python quite recently and found it was a pessimization (search python-dev, I guess). I'm not sure that's what Paul meant, though. Cheers, M. -- ZAPHOD: Listen three eyes, don't try to outwierd me, I get stranger things than you free with my breakfast cereal. -- The Hitch-Hikers Guide to the Galaxy, Episode 7
Samuele Pedroni wrote:
From: "Christian Tismer" <tismer@tismer.com> [tagged object references]
Can you give me a hint how this would be done? I have no experience with tagged representations, but if this can save so much, I should learn more about it.
this is a relevant survey paper:
http://citeseer.nj.nec.com/gudeman93representing.html
Representing Type Information in Dynamically Typed Languages
Oh, well, sorry, now I remember. Sure, I also remember that I was thinking loud about boxing variables and tagging addresses a few years ago. Can't find it any longer, but as I remember, somebody was not convinced that tagging would be great for Python. Maybe TimBot? I know a couple fo interpreters which use tagging, and Guile for instance makes extensive use of it. This technique is kind of data compression by putting type info into an unused pattern of addresses. It can avoid lots of allocations, but also turns every object access into some macro processing. for a language that I have to code by hand, I don't like that so much. But when we have control over everything, like in this project, it makes at least sense to try an alternative object model, by the newly gathered flexibility. For the bootstrap phase, I'd say hands off from any changes of the model. First we redefine the world identically "in place". When that works, we can try changing worlds.
integer in the range [-2**30,2**30-1] would be represented by their bit patterns shifted left by 1 bit
all other objects would be boxed, and the address left shifted by 1 and ORed with 1.
Humm. I would use the left shifted integer, or'ed with 1. After testing that bit, Arithmetic right shift gives the integer. That allows to use the address variant without any operation and has the nice debugging facility that addresses are already addresses, and object inspection works directly. ...
Another approach is to carry data around as a struct where both the tag and the datum are machine words, this approach is AFAIK currently used in the Gwydion Dylan compiler (www.gwydiondylan.org) originally developed at CMU.
This makes it impossible to get an object as a result value of a function. You need references all the time. Also, container objects will double their size. I think, both can be tried, later, by changing only a little Python code that is responsible for the representation of this. Getting completely rid of reference counting in favor of a classical GC is also a challenging path which cannot be tried with CPython at all. The best layout will win the game. I don't know in advance, so let's create the thing flexible enough to keep all paths open. cheers - chris p.s.: Paul, would you mind to participate in pypy-dev, then we can avoid crossposting. -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
participants (3)
-
Christian Tismer -
Michael Hudson -
Samuele Pedroni