
Hi Paolo, Thanks for the introduction e-mail. I'd like to answer it more directly, but maybe I should first give a quick overview of what I think are possible misunderstandings. The main issue is that unlike Google V8, in PyPy there is no machine code generation at all (ignoring the JIT compiler generator for the first part of this e-mail). That means that the notion of inline cache simply doesn't make sense. Let me explain this in more details. What we implemented long ago in PyPy is completely similar to the "maps" of the Self language. It is a dictionary implementation that stores the keys in a separate, shared "map" object, and that stores the values in a flat list. In PyPy this thing still presents the interface of a full-featured dictionary, so it supports adding and removing arbitrary elements, which is done by changing the dictionary's current map. This saves memory, as you know. But I don't see how this can provide much speed-up in a pure interpreter (as opposed to in generated machine code with inline caches). It seems to me that whatever you do, you need to perform a lookup *somewhere* in order to implement the "getattr" operation. The lookup can be done directly in the object's dictionary, or it can be done in the shadow "map" object, but the point is that in both cases it needs to be done at runtime for every attribute access. There are probably advanced ways to cache recent lookups, which we didn't try so far, like attaching to each GETATTR opcode in the Python bytecode a tiny structure that caches the result of the most recent lookup performed at that source code position. I don't know how much speed-ups this would give; in fact it is rather unclear to me why people think that dictionary lookups are seriously bad. Of course they are more expensive than just reading an item out of a list, but I don't think it's worse than roughly 3 times slower; this number looks big in the context of machine code, but insignificant in the context of a complicated bytecode interpreter. I'm ready to be proven wrong, of course. However, this is only half of the story. Another very important point is that the mid-term goal for PyPy is to have again a JIT compiler generator (it did so in the 1.0 release, but it was far from providing consistent speed-ups, which is the reason that we are re-experimenting at the moment). In the context of our JIT, the inline cache technique you describe -- as well as the polymorphic inline caches of Java VMs -- are special cases of a generic construct called "promotion" in [1] or "unlifting" in Psyco [2]. Using this construct, the JIT compiler generator takes as input our "dictionary-with-maps"-enabled interpreter and produces as output a JIT compiler that itself emits inline caches similar to Google V8's. Another way to put it is that in PyPy the real purpose of our "maps", besides saving memory, is to be JIT-generator-friendly, enabling good speed-ups in the machine code generated by the JIT (and not in the pure interpreter). Thus, what we implemented in the interpreter is enough already; the main missing piece is a (completely generic) JIT generator producing a consistent JIT. Note that this is not just an abstract wish: such inline caches is what Psyco emits already; and the automatically generated JIT of PyPy 1.0 definitely did that too. To repeat a point that we tried to make a lot of times but apparently keep failing to: in the context of the PyPy JIT there is no point in thinking deeply about the various semantics of Python specifically, like reflection, operator overloading, the handlers for sends to unsupported methods (that's __getattr__ in Python), and so on. These semantics are encoded in the interpreter already, where the JIT compiler generator can see them and use them to produce a JIT that emits reasonably efficient machine code for a given user program. The relevant question in this context is not how to encode some complicated Python semantics in the machine code; the relevant question is to identify the cases where the generated JIT emits definitely inefficient code, and how to tweak the interpreter in order to be more JIT-friendly (e.g. by adding maps into the interpreter). In an attempt to make it clearer that thinking about advanced Python semantics is not useful in this context, the "Python" in the previous paragraph can be replaced by any language, preferably dynamic, and we still get a JIT for it (e.g. we did it for Prolog, and we have partial or in-progress interpreters for JavaScript and Smalltalk and other languages). Does the above make any sense? There are three relatively distinct areas to work on -- tweaking the interpreter to maximize its speed, tweaking it to maximize its JIT-friendlyness (better done after the JIT generator is ready), or helping develop the JIT generator itself; I hope the above helped clarify a bit this aspect of PyPy. In any case, we would definitely be interested in working with you in these areas, or in the other areas that you mentioned in your e-mail (GC...). Feel free to discuss matters further, here or on irc (#pypy on irc.freenode.net). [1] http://codespeak.net/pypy/dist/pypy/doc/jit/ [2] http://psyco.sourceforge.net/doc.html [PEPM'04 paper] A bientot, Armin.