Re: [pypy-dev] Flow graphs, backends and JIT
3. Which component actually does the JIT? Is it just a tweak on the code generator or are the flow graphs generated differently?
The flow graphs are taken from the translator and modified by the JIT generator.
My question is:
Does JIT involve another "transformation" of the flow graphs? In normal (non-JIT) code generation some flow graphs are fed to the backend generator. Wich step is different in the JIT case? Does the backend generator get different flow graphs or are the same flow graphs compiled differently by a tweaked code generator?
They get the same flowgraphs.
So, if I understand well, there is no common JIT code among different backends? The JIT we have is the C-backend specific? Different backends would need a new JIT approach?
4. Is there some documentation how to write a backend (code generator)? The source code is poorly documented and the topic is not mentioned on the web page. What exactly do I need to implement to have a backend?
You mean a JIT backend or a RPython backend?
A RPython backend first. Is there any documentation, tutorial, simple toy backend or anything I could start with?
No. In fact, the only RPython backend that is well-maintained is the C one.
OK, so where could I start from? Is there for example some list of flow graphs opcodes?
You might find this useful: http://www.aosabook.org/en/pypy.html
OK, that was useful. It seems that the JIT generator is some assembler embedded into the final binary. Does JIT generator share some code with the backend generator?
No.
Would it be possible to get rid of the normal code generator (leaving only some glue code) and relaying only on the JIT generator, that would produce the whole code?
No. The JIT generator is specialized for dynamic languages not ones like RPython, which can be translated to C.
This would reduce the size of the binary and would not hit performance much, since loops would be generated as usual, only the non-looping execution would be different.
Why would it reduce the size of the binary?
That is my poor understanding, I might be wrong. In the current approach in a binary there is a compiled machine code, the flow graph representation and the JIT compiler. I think we could get rid of (most) compiled machine code, leaving only some startup code to spawn the JIT compiler. Then, each code path would be compiled by JIT and executed. Loops would run as fast as usual. Non-loop code would run slower, but I think this would be a minor slowdown. Most importantly, as I understand, the binary contains many versions of the same code paths specialized for different types. If we throw it out, the binary would be smaller. This is not a proposal. It is just a try at understanding things. haael
On Tue, Sep 18, 2012 at 9:35 AM, haael <haael@interia.pl> wrote:
3. Which component actually does the JIT? Is it just a tweak on the code generator or are the flow graphs generated differently?
The flow graphs are taken from the translator and modified by the JIT generator.
My question is:
Does JIT involve another "transformation" of the flow graphs? In normal (non-JIT) code generation some flow graphs are fed to the backend generator. Wich step is different in the JIT case? Does the backend generator get different flow graphs or are the same flow graphs compiled differently by a tweaked code generator?
They get the same flowgraphs.
So, if I understand well, there is no common JIT code among different backends? The JIT we have is the C-backend specific? Different backends would need a new JIT approach?
Most of the JIT code is not C-backend specific. Backends are along the line of x86, arm, PPC. If you want to create a say LLVM backend, you would reuse most of the JIT code. Regarding your other questions - what sort of backend you have in mind? Because depending on it, it might be easier or harder to write one and answers to all your other questions might be different. Cheers, fijal
3. Which component actually does the JIT? Is it just a tweak on the code generator or are the flow graphs generated differently?
The flow graphs are taken from the translator and modified by the JIT generator.
My question is:
Does JIT involve another "transformation" of the flow graphs? In normal (non-JIT) code generation some flow graphs are fed to the backend generator. Wich step is different in the JIT case? Does the backend generator get different flow graphs or are the same flow graphs compiled differently by a tweaked code generator?
They get the same flowgraphs.
So, if I understand well, there is no common JIT code among different backends? The JIT we have is the C-backend specific? Different backends would need a new JIT approach?
Most of the JIT code is not C-backend specific. Backends are along the line of x86, arm, PPC. If you want to create a say LLVM backend, you would reuse most of the JIT code.
So I don't understand anything again. Where exactly JIT is coded? What is the difference between the build process of a JIT and non-JIT binary? It's not in the flow graphs. It is in the backend. How can C backend and, say, CLI backend share code?
Regarding your other questions - what sort of backend you have in mind? Because depending on it, it might be easier or harder to write one and answers to all your other questions might be different.
Nothing in particular. I just want to gain some knowledge and start hacking PyPy. I used to write compilers and some embedded programming, so I thought that writing a new backend may be the easiest for me. Said again, I just want to start.
Cheers, fijal
haael
2012/9/18 haael <haael@interia.pl>:
Most of the JIT code is not C-backend specific. Backends are along the line of x86, arm, PPC. If you want to create a say LLVM backend, you would reuse most of the JIT code.
So I don't understand anything again. Where exactly JIT is coded? What is the difference between the build process of a JIT and non-JIT binary? It's not in the flow graphs. It is in the backend. How can C backend and, say, CLI backend share code?
Maciej is referring to JIT backends, not the translator backend. -- Regards, Benjamin
Hi Haael, Here is again a high-level overview. Although we use the term "backend" for both, there are two completely unrelated components: the JIT backends and the translation backends. The translation backends are part of the static translation of a PyPy (with or without the JIT) to C code. The translation backends turn control flow graphs into, say, C source code representing them. These control flow graphs are roughly at the same level as Java VM opcodes, except that depending on the backend, they may either contain GC operations (e.g. when translating to Java or CLI) or not any more (e.g. when translating to C). We have control flow graphs for each RPython function in the source code of PyPy, describing an interpreter for Python. Now the JIT is an optional part of that, which is written as more RPython code --- and gets statically translated into more control flow graphs, but describing only the JIT itself, not any JITted code. JITted code (in the form of machine code) is produced at runtime, obviously, but using different techniques. It is the job of the JIT backends to produce this machine code in memory. This is unrelated to the translation backends: a JIT backend inputs something that is not a control flow graph (but a linear "trace" of operations), works at runtime (so is itself written in RPython), and outputs machine code in memory (rather than writing C sources into a file). The input for the JIT backend comes from a front-end component: the tracing JIT "metacompiler". It works by following what the interpreter would do for some specific input (i.e. the precise Python code we see at runtime). This means that the JIT front-end starts with the control flow graphs of the interpreter and produces a linear trace out of it, which is fed to the JIT backend. The control flow graphs in questions must be available at runtime, so we need to serialize them. The precise format in which the flow graphs are serialized is called "JitCodes". Although very similar to the flow graphs, everything that is unnecessary for the JIT was removed, most importantly the details of the type information --- e.g. all sizes and signedness of integer variables are all represented as one "int" type, because the JIT wouldn't have use for more; and similarly any GC pointer to any object is represented as just one "GC pointer" type. I hope this helps :-) A bientôt, Armin.
2012/9/18 haael <haael@interia.pl>:
OK, so where could I start from? Is there for example some list of flow graphs opcodes?
You can use the graphviewer described in the documentation.
In the current approach in a binary there is a compiled machine code, the flow graph representation and the JIT compiler. I think we could get rid of (most) compiled machine code, leaving only some startup code to spawn the JIT compiler. Then, each code path would be compiled by JIT and executed. Loops would run as fast as usual. Non-loop code would run slower, but I think this would be a minor slowdown. Most importantly, as I understand, the binary contains many versions of the same code paths specialized for different types. If we throw it out, the binary would be smaller.
This is not a proposal. It is just a try at understanding things.
It might be technically possible, but it's definitely not within the design goals of the JIT.
haael
-- Regards, Benjamin
Hi Haael, Cool that you want to work on PyPy! haael <haael@interia.pl> wrote:
Why would it reduce the size of the binary?
That is my poor understanding, I might be wrong.
In the current approach in a binary there is a compiled machine code, the flow graph representation and the JIT compiler. I think we could get rid of (most) compiled machine code, leaving only some startup code to spawn the JIT compiler. Then, each code path would be compiled by JIT and executed. Loops would run as fast as usual. Non-loop code would run slower, but I think this would be a minor slowdown. Most importantly, as I understand, the binary contains many versions of the same code paths specialized for different types. If we throw it out, the binary would be smaller.
This is not a proposal. It is just a try at understanding things.
Will just reply to this part, typing on the phone is annoying. What you write above is actually a good proposal. We have discussed the viability of related schemes in the past. There are two problem that I see with it. 1. While the speed of your proposed system would eventually be the same, it would suffer from much slower warmup, because after startup you would have to generate a lot of machine code before executing the user's code. 2. More fundamentally (and this is where I think you have missed a detail about the JIT so far) the JIT ist trace-based. The JIT backends cannot deal with arbitrary control flow, only with linear traces. Therefore it would not be straightforward to use the same JIT backends to bootstrap parts of the interpreter as runtime. As for tasks you could work on: would you maybe because interested in helping with the ARM JIT backend? Cheers, Carl Friedrich
Hi Carl Friedrich, On Tue, Sep 18, 2012 at 10:00 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
2. More fundamentally (and this is where I think you have missed a detail about the JIT so far) the JIT ist trace-based. The JIT backends cannot deal with arbitrary control flow, only with linear traces.
You missed an intermediate solution: have the JIT's blackhole interpreter run the jitcodes before warm-up. We don't have to actually JIT-compile everything before being able to run it, which would indeed completely kill warm-up times. This would give a (slow but not unreasonably slow) solution: a very general "RPython interpreter and JIT-compiler" that would input and run some set of serialized jitcodes --- similar to a Java VM, actually. (There are tons of minor issues ahead, like all the stranger operations that don't have a jitcode equivalent so far, e.g. working on "long double" or "long long long" or weakrefs...) Note that in order to make the "RPython interpreter and JIT-compiler" itself, we would need to translate regular RPython code --- which means it doesn't help at all if the goal is to port RPython to non-C translation targets. It's merely a cool hack, and maybe a debugging help to trade fast translation time for a slower result. A bientôt, Armin.
On Wed, Sep 19, 2012 at 11:38 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi Carl Friedrich,
On Tue, Sep 18, 2012 at 10:00 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
2. More fundamentally (and this is where I think you have missed a detail about the JIT so far) the JIT ist trace-based. The JIT backends cannot deal with arbitrary control flow, only with linear traces.
You missed an intermediate solution: have the JIT's blackhole interpreter run the jitcodes before warm-up. We don't have to actually JIT-compile everything before being able to run it, which would indeed completely kill warm-up times. This would give a (slow but not unreasonably slow) solution: a very general "RPython interpreter and JIT-compiler" that would input and run some set of serialized jitcodes --- similar to a Java VM, actually. (There are tons of minor issues ahead, like all the stranger operations that don't have a jitcode equivalent so far, e.g. working on "long double" or "long long long" or weakrefs...)
Note that in order to make the "RPython interpreter and JIT-compiler" itself, we would need to translate regular RPython code --- which means it doesn't help at all if the goal is to port RPython to non-C translation targets. It's merely a cool hack, and maybe a debugging help to trade fast translation time for a slower result.
A bientôt,
Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
I guess this is what pypyjit.py does, more or less. You still need the blackhole interpreter to run in something
Hi Fijal, On Wed, Sep 19, 2012 at 12:30 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I guess this is what pypyjit.py does, more or less. You still need the blackhole interpreter to run in something
Right, indeed, pypyjit.py fulfills already the "debugging helper" role. That leaves only the "cool hack" role that I can think of right now... :-) A bientôt, Armin.
participants (5)
-
Armin Rigo
-
Benjamin Peterson
-
Carl Friedrich Bolz
-
haael
-
Maciej Fijalkowski