[Python-Dev] Python Specializing Compiler

Armin Rigo arigo@ulb.ac.be
Mon, 25 Jun 2001 15:08:52 +0200


Hello everybody,

A note about what I have in mind about Psyco... Type-sets are independent=20
from memory representation. In other words, it is not because two variables=
=20
can take the same set of values that the data is necessarily encoded in the=
=20
same way in memory.

In particular, I believe we won't need to change the way the current Python=
=20
interpreted encodes data. For example, instances currently have a=20
dictionary of attributes and no "fixed slots", but this is not a problem=20
for Psyco, which can encode instances in better ways (e.g. as a C struct)=20
as long as it is only accessed by Psyco-compiled Python code and no=20
"legacy" code.

This approach also allows Psyco to completely remove the overhead of=20
creating bound method objects and frame objects; both are generally=20
temporary, and so during their whole lifetime they can be represented much=
=20
more efficiently in memory. For frame objects it should be clear (we=20
probably need no frame at all as long as no exception exits the current=20
procedure, and even in this case it could be optimized). For method objects=
=20
we use "memory sharing", a technique already applied in the current Psyco.=
=20
More precisely, if some (immutable) data is found at some memory location=20
(or machine register) and Python code says it should be duplicated, we need=
=20
not duplicate it at all; we can just consider that the copy is at the same=
=20
location as the original. For method objects it means the following:=20
suppose you have an instance "xyz" and query its "foo()" method. Suppose=20
that you can (at some time) be sure that, because of the class of "xyz",=20
"xyz.foo" will always be the Python function "f". Then the method object's=
=20
representation can be simplified: all it needs to store in memory is a=20
pointer to "xyz", because "f" is a constant part. Now a single pointer to=20
the "xyz" instance is exactly the same memory format as the original "xyz"=
=20
variable, so that this particular representation of a bound method object=20
can share the original "xyz" pointer. No actual machine code is produced;=20
Psyco simply notes that both "xyz" and "xyz.foo" are represented at the=20
same location, althought "xyz" represents an instance with the given=20
pointer, and "xyz.foo" represents the "f" function with its first argument=
=20
bound to the given pointer.

According to est@hyperreal.org, method and frame objects each represent 20%=
=20
of the execution time... (Est, on which kind of machine did you get Psyco=20
run the sample code 5 times faster !? It's only 2 times faster on a modern=
=20
Pentium...)


A bient=F4t,

Armin.