Armin Rigo <arigo@tunes.org> wrote:
Now I believe in following the idea that CPython's main loop does not assume anything at all about the PyObjects it manipulates, but does every operation via calls to library functions like PyNumber_Add(). All it does is moving PyObjects around, mainly between the stack and the locals. Similarily, our own main loop should not assume anything about the objects it stores in the stack and the locals, not even that they have some well-known methods. Instead, we would provide it with the equivalent of PyNumber_Add() & co.
I was originally going to say I thought the general-dispatch-function and the standard-object-method-interface options were practically equivalent, but I changed my mind. The reason lies along the concept of multi-methods. (I think that's the right term - functions are polymorphic not just on their first argument (the parent object), but on all arguments.) This may not make much difference on most operations, but it facilitates arithmatic conversion. If we're adding two numbers, we can handle the conversion code in the general Py_NumberAdd routine, instead of spreading it out to the functions implementing float and integers and rationals etc. ...
* wrap(a), taking a *normal* object and putting it into a black box. Its what the interpreter does for a LOAD_CONST, for example: it has a real Python object and wants to send it to the object space. For example, if the object space has a custom implementation "class MyList" for lists, then wrap([1,2,3]) should create a MyList instance.
I wonder whether wrap() and unwrap() are really nessasary - they seem to blur the distinction between interpreter-level and application-object- implementation-level. It seems to me that the LOAD_CONST issue woold be better dealt with by having the compiler/function loader create the black box objects themselves, instead of there being a run-time translation between interpreter-level objects and application-level objects.
* unwrap(x) is the inverse operation. Used by the interpreter in the rare cases where it really needs to observe the object. For example, for a conditional jump, after it obtained the truth-value with (say) a call to the truth() method, it must pull the result out of the object space to know whether it is really False or True.
But how would we handle it if we wanted to redefine what was considered "true"? A generic unwrap couldn't tell what the unwrapped object is being used for. A better way of implementing it would be to look at the specific cases when the interpreter machinery actually cares about the value of the black box object. In the case of conditional branching, it might be best to define a standard function in the ObjectSpace which takes a black-box application-level object and returns an interpreter-level object representing the truth value. (Not a generally "unwrapped" object, specifically 0 or 1 - or rather interpreter-level True and False, depending on how compatible we want to make the the PyPython codebase.)
Looks like a cool point of view, doesn't it ?
Very much so, and it interfaces well with the concept of being able to hot-swap bytecodes. (Not only can you change the bytecode semantics of the processor, you can also change the object semantics.) But-first-we-need-a-working-implementation... -Rocco __________________________________________________________________ The NEW Netscape 7.0 browser is now available. Upgrade now! http://channels.netscape.com/ns/browsers/download.jsp Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/
It seems like an appropriate time to ask this: I know that we want to preserve language semantics. However, are bytecode-level semantics similarly to be preserved, especially because bytecode isn't always portable between CPython versions? More specifically, is a stack-based VM sacrosanct? The traditional argument against register-based systems has been that they usually increase compiled code size. However, storage and bandwidth are becoming cheaper, and I think we could take advantage of "hardware" tricks like conditional execution and mulitple dispatch to significantly increase speed. In fact, we could take advantage of the fact that "memory" accesses are just as cheap as "register" accesses for the PyPy VM and even use a memory-memory architecture. That gives the highest code density -- even higher than stack-based bytecode -- and still allows most "hardware" tricks. Finally, It allows for a nice abstraction of various parts of the execution engine - the "Integer Functional Unit" could be highly optimized, because it would always know that it was dealing with integers. We wouldn't be stuck with machine types, though, we could have a "String Functional Unit" just as easily. VanL
In fact, we could take advantage of the fact that "memory" accesses are just as cheap as "register" accesses for the PyPy VM and even use a memory-memory architecture. That gives the highest code density -- even higher than stack-based bytecode -- and still allows most "hardware" tricks.
I think this is a major point for archiving a better VM. I dont think we are stucked with stack-based VM, we can experiment different road... I think that different VM implementations are a possibility, and we, at least, can let the user (or the optimizer!) choose which one is better for a certain application, or even part of application if in that case the problem is speed and not memory footprint. Paolo Invernizzi
VanL wrote:
It seems like an appropriate time to ask this:
I know that we want to preserve language semantics. However, are bytecode-level semantics similarly to be preserved, especially because bytecode isn't always portable between CPython versions?
Having our bytecode interpreter serves different goals. One is to have CPython available as a compiler, immediately. Another one is to understand the Python engine better. By re-implementing in a different language, but with similar semantics, we learn everything about it. Furthermore, we try to get this huge amount of open issues smaller. The few things we don't change now is the language and the bytecodes. After having made our way though this, we have a very well understanding of the implementation and all the issues it handles. We are by no means saying that byte VMs are the best way to go. There are other ways which have their advantages. At first, we want to go this way, which looks short and achievable. When that works, we can change the target. ciao - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
Hello Rocco, On Thu, Feb 13, 2003 at 09:50:39PM -0500, Rocco Moretti wrote:
I was originally going to say I thought the general-dispatch-function and the standard-object-method-interface options were practically equivalent, but I changed my mind.
The reason lies along the concept of multi-methods.
That was the point. A more minor point is that the methods add() etc. now automatically have a hidden "self" argument, which can be used to determine the context in which we are working (more specifically, the object space, but then from the object space we might come back to the currently executing frame).
I wonder whether wrap() and unwrap() are really nessasary - they seem to blur the distinction between interpreter-level and application-object- implementation-level.
I think that the generality of these methods might be very useful in the future. If instead we tried to identify all the spots in the interpreter that require a specific treatment that could be done in generality with wrap() or unwrap(), then we would be more dependent on the current Python VM and bytecodes. I admit there are also drawbacks, but I would rather say that there are mainly optimization drawbacks. We can always later add a "hint" optional parameter to wrap() and unwrap() to let the ObjectSpace know for which particular reason the method is called. The wrap()/unwrap() symmetry is also nice because it allows us to define a "default" implementation for all ObjectSpace methods: def add(self, x, y): x0, y0 = self.unwrap(x0), self.unwrap(y0) z0 = x0 + y0 return self.wrap(z0) If you think about unwrap()/wrap() as respectively downloading/uploading the object through a network, it's not the fastest implementation, but it works!
It seems to me that the LOAD_CONST issue woold be better dealt with by having the compiler/function loader create the black box objects themselves, instead of there being a run-time translation between interpreter-level objects and application-level objects.
I don't think it's so clear. LOAD_CONST is not the only place where we would need wrap(). The bytecode interpreter may catch a RuntimeError or MemoryError, for example, and then it would want to send it to the application. As for LOAD_CONST, it's not clear where (interpreter- or object-space) the constants are best stored. Think about implementing Pyrex, the Python-to-C translator: it could be done with a PyrexObjectSpace whose add() method would only emit the following line of C code: v3 = PyNumber_Add(v1, v2); and the "black box" objects would only store the name of the C variable, not an actual value. If you have an explicit wrap() operation, then whenever a LOAD_CONST is seen, PyrexObjectSpace emits C code like: v5 = PyInt_FromLong(123); If on the other hand the LOAD_CONST is invisible to the object space, then PyrexObjectSpace must pre-build all the constants and put them into global C variables, whose names are transmitted to the pypy main loop. It might be a good optimization to do so, but then it might not always be the case (e.g. if we are targetting machine code instead of C and don't have so many registers available). With an explicit wrap() you can choose to "cache" all or some constants in global variables, or not. Without the wrap() you are forced to "cache" them all.
* unwrap(x) is the inverse operation.
But how would we handle it if we wanted to redefine what was considered "true"? A generic unwrap couldn't tell what the unwrapped object is being used for.
Yes, that's right. It's actually the most difficult operation; for example, in PyrexObjectSpace, you cannot actually unwrap any object because it will later be in some variable of your emitted C code, but not now. I still think that we should try to provide a generic unwrap(), because it is essential for "network" object spaces where it represents downloading, or for Psyco where it represents suspending compilation, running the code, waiting for the execution point to reach that point, and "pulling" the actual value out of the registers. ObjectSpaces may only partially implement wrap()/unwrap(), i.e. it would be acceptable for them to fail to wrap or unwrap a value. For example, it can be expected that PyrexObjectSpace can only wrap "literal constants", like integers, strings, and floats (by emitting a call to PyXxx_FromYyy()). Conversely, it can only unwrap objects returned by its own truth() method, where it knows that it must be either False or True. (It still doesn't know whether it is False or True, but it can *duplicate* the caller frame and returns False once and True once, so that both paths will be tried.) The point of this lengthly explanation is to show that the difficulty does not lie directly in "unwrap() vs. specialized-versions-of-it()", because both have the same problem in some ObjectSpaces. I expect the same problem to show up in any specialized version of unwrap(). The rest is optimization hints. A bientôt, Armin.
Hello Rocco, Mmh, my previous e-mail was quite lengthy. Sorry for that, I guess I was thinking aloud... I was seduced in the first place by the symmetry and generality of wrap()/unwrap() (whose name came from the already-cited paper "Representing Type Information in Dynamically Typed Languages", http://citeseer.nj.nec.com/gudeman93representing.html). One of the problems with wrap()/unwrap() is when they are applied to containers. As they are, they have to "translate" the whole structure recursively, which is often not a good idea. On the other hand, if we define more specialized purpose-oriented methods, we loose opportunities given by wrap()/unwrap() to let particular routines send or examine an arbitrary piece of data. I'm thinking about type-based dispatch, for example, where you have to query the type of an object with the appropriate ObjectSpace method, and then unwrap() this type. I don't know which one is better. Maybe both are useful. I guess we will have to try and see. Hopefully, trying both should be easy in the early phases. A bientôt, Armin.
participants (5)
-
Armin Rigo
-
Christian Tismer
-
Paolo Invernizzi
-
roccomoretti@netscape.net
-
VanL