A new JIT compiler for a faster CPython?
Hi, I would like to write yet another JIT compiler for CPython. Before writing anything, I would like your opinion because I don't know well other Python compilers. I also want to prepare a possible integration into CPython since the beginning of the project, or at least stay very close to the CPython project (and CPython developers!). I did not understand exactly why Unladen Swallow and psyco projects failed, so please tell me if you think that my project is going to fail too! == Why? == CPython is still the reference implementation, new features are first added to this implementation (ex: PyPy is not supporting Python 3 yet, but there is a project to support Python 3). Some projects still rely on low level properties of CPython, especially its C API (ex: numpy; PyPy has a cpyext module to emulate the CPython C API). A JIT is the most promising solution to speed up the main evaluation loop: using a JIT, it is possible to compile a function for a specific type on the fly and so enable deeper optimizations. psyco is no more maintained. It had its own JIT which is complex to maintain. For example, it is hard to port it to a new hardware. LLVM is fast and the next version will be faster. LLVM has a community, a documentation, a lot of tools and is active. There are many Python compilers which are very fast, but most of them only support a subset of Python or require to modify the code (ex: specify the type of all parameters and variables). For example, you cannot run Django with Shredskin. IMO PyPy is complex and hard to maintain. PyPy has a design completly different than CPython and is much faster and has a better memory footprint. I don't expect to be as fast as PyPy, just faster than CPython. == General idea == I don't want to replace CPython. This is an important point. All others Python compilers try to write something completly new, which is an huge task and is a problem to stay compatible with CPython. I would like to reuse as much as possible code of CPython and don't try to fight against the GIL or reference counting, but try to cooperate instead. I would like to use a JIT to generate specialized functions for a combinaison of arguments types. Specialization enables more optimizations. I would like to use LLVM because LLVM is an active project, has many developers and users, is fast and the next version will be faster! LLVM already supports common optimizations like inlining. My idea is to emit the same code than ceval.c from the bytecode to be fully compatible with CPython, and then write a JIT to optimize functions for a specific type. == Roadmap == -- Milestone 1: Proof of concept -- * Use the bytecode produced by CPython parser and compiler * Only compile a single function * Emit the same code than ceval.c using LLVM, but without tracing, exceptions nor signal handling (they will be added later) * Support compiling and calling the following functions: "def func(a, b): return a+b" The pymothoa project can be used as a base to implement quickly such proof of concept. -- Milestone 2: Specialized function for the int type -- * Use type annotation to generate specialized functions for the int type * Use C int with a guard detecting integer overflow to fallback on Python int -- Milestone 3: JIT -- * Depending on the type seen at runtime, recompile the function to generate specialized functions * Use guard to fallback to a generic implementation if the type is not the expected type * Drop maybe the code using function annotations At this step, we can start to benchmark to check if the (JIT) compiler is faster than CPython. -- Later (unsorted ideas) -- * Support exceptions * Full support of Python - classes - list comprehension - etc. * Optimizations: - avoid reference counting when possible - avoid temporary objects when possible - release the GIL when possible - inlining: should be very interesting with list comprehension - unroll loops? - lazy creation of the frame? * Use registers instead of a stack in the "evaluation loop"? * Add code to allow tracing and profiling * Add code to handle signals (pending calls) * Write a compiler using the AST, with a fallback to the bytecode? (would it be faster? easier or more complex to maintain?) * Test LLVM optimizers * Compile a whole module or even a whole program * Reduce memory footprint * Type annotation to help the optimizer? (with guards?) * "const" annotation to help the optimizer? (with guards?) * Support any build option of Python: - support Python 2 (2.5, 2.6, 2.7) and 3 (3.1, 3.2, 3.3, 3.4) - support narrow and wide mode: flag at runtime? - support debug and release mode: flag at runtime? - support 32 and 64 bits mode on Windows? == Other Python VM and compilers == -- Fully Python compliant -- * `PyPy http://pypy.org/`_ * `Jython http://www.jython.org/`_ based on the JVM * `IronPython http://ironpython.net/`_ based on the .NET VM * `Unladen Swallow http://code.google.com/p/unladen-swallow/`_ fork of CPython 2.6 using LLVM - `Unladen Swallow Retrospective http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html`_ - `PEP 3146 http://python.org/dev/peps/pep-3146/`_ * `psyco http://psyco.sourceforge.net/`_ (fully Python compliant?), no more maintained -- Subset of Python to C++ -- * `Nuitka http://www.nuitka.net/pages/overview.html`_ * `Python2C http://strout.net/info/coding/python/ai/python2c.py`_ * `Shedskin http://code.google.com/p/shedskin/`_ * `pythran https://github.com/serge-sans-paille/pythran`_ (no class, set, dict, exception, file handling, ...) -- Subset of Python -- * `pymothoa http://code.google.com/p/pymothoa/`_: use LLVM; don't support classes nor exceptions. * `unpython http://code.google.com/p/unpython/`_: Python to C * `Perthon http://perthon.sourceforge.net/`_: Python to Perl * `Copperhead http://copperhead.github.com/`_: Python to GPU (Nvidia) -- Language very close to Python -- * `Cython http://www.cython.org/`_: "Cython is a programming language based on Python, with extra syntax allowing for optional static type declarations." Based on `Pyrex http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/`_ == See also == * `Volunteer developed free-threaded cross platform virtual machines? http://www.boredomandlaziness.org/2012/07/volunteer-supported-free-threaded-...`_ Victor Stinner
I'll admit I didn't read through your email, but you should absolutely check out Numba which is ramping up just now to do this: https://github.com/numba (I'm CC-ing their mailing list, perhaps some of them will read this and respond.) It is probably much less ambitious but that hopefully shouldn't stop you cooperating. It's started by Travis Oliphant (who started NumPy); here's his thoughts on PyPy and NumPy which provides some of the background for this project. http://technicaldiscovery.blogspot.no/2011/10/thoughts-on-porting-numpy-to-p... Dag On 07/17/2012 08:38 PM, Victor Stinner wrote:
Hi,
I would like to write yet another JIT compiler for CPython. Before writing anything, I would like your opinion because I don't know well other Python compilers. I also want to prepare a possible integration into CPython since the beginning of the project, or at least stay very close to the CPython project (and CPython developers!). I did not understand exactly why Unladen Swallow and psyco projects failed, so please tell me if you think that my project is going to fail too!
== Why? ==
CPython is still the reference implementation, new features are first added to this implementation (ex: PyPy is not supporting Python 3 yet, but there is a project to support Python 3). Some projects still rely on low level properties of CPython, especially its C API (ex: numpy; PyPy has a cpyext module to emulate the CPython C API).
A JIT is the most promising solution to speed up the main evaluation loop: using a JIT, it is possible to compile a function for a specific type on the fly and so enable deeper optimizations.
psyco is no more maintained. It had its own JIT which is complex to maintain. For example, it is hard to port it to a new hardware.
LLVM is fast and the next version will be faster. LLVM has a community, a documentation, a lot of tools and is active.
There are many Python compilers which are very fast, but most of them only support a subset of Python or require to modify the code (ex: specify the type of all parameters and variables). For example, you cannot run Django with Shredskin.
IMO PyPy is complex and hard to maintain. PyPy has a design completly different than CPython and is much faster and has a better memory footprint. I don't expect to be as fast as PyPy, just faster than CPython.
== General idea ==
I don't want to replace CPython. This is an important point. All others Python compilers try to write something completly new, which is an huge task and is a problem to stay compatible with CPython. I would like to reuse as much as possible code of CPython and don't try to fight against the GIL or reference counting, but try to cooperate instead.
I would like to use a JIT to generate specialized functions for a combinaison of arguments types. Specialization enables more optimizations. I would like to use LLVM because LLVM is an active project, has many developers and users, is fast and the next version will be faster! LLVM already supports common optimizations like inlining.
My idea is to emit the same code than ceval.c from the bytecode to be fully compatible with CPython, and then write a JIT to optimize functions for a specific type.
== Roadmap ==
-- Milestone 1: Proof of concept --
* Use the bytecode produced by CPython parser and compiler * Only compile a single function * Emit the same code than ceval.c using LLVM, but without tracing, exceptions nor signal handling (they will be added later) * Support compiling and calling the following functions: "def func(a, b): return a+b"
The pymothoa project can be used as a base to implement quickly such proof of concept.
-- Milestone 2: Specialized function for the int type --
* Use type annotation to generate specialized functions for the int type * Use C int with a guard detecting integer overflow to fallback on Python int
-- Milestone 3: JIT --
* Depending on the type seen at runtime, recompile the function to generate specialized functions * Use guard to fallback to a generic implementation if the type is not the expected type * Drop maybe the code using function annotations
At this step, we can start to benchmark to check if the (JIT) compiler is faster than CPython.
-- Later (unsorted ideas) --
* Support exceptions * Full support of Python
- classes - list comprehension - etc.
* Optimizations:
- avoid reference counting when possible - avoid temporary objects when possible - release the GIL when possible - inlining: should be very interesting with list comprehension - unroll loops? - lazy creation of the frame?
* Use registers instead of a stack in the "evaluation loop"? * Add code to allow tracing and profiling * Add code to handle signals (pending calls) * Write a compiler using the AST, with a fallback to the bytecode? (would it be faster? easier or more complex to maintain?) * Test LLVM optimizers * Compile a whole module or even a whole program * Reduce memory footprint * Type annotation to help the optimizer? (with guards?) * "const" annotation to help the optimizer? (with guards?) * Support any build option of Python:
- support Python 2 (2.5, 2.6, 2.7) and 3 (3.1, 3.2, 3.3, 3.4) - support narrow and wide mode: flag at runtime? - support debug and release mode: flag at runtime? - support 32 and 64 bits mode on Windows?
== Other Python VM and compilers ==
-- Fully Python compliant --
* `PyPyhttp://pypy.org/`_ * `Jythonhttp://www.jython.org/`_ based on the JVM * `IronPythonhttp://ironpython.net/`_ based on the .NET VM * `Unladen Swallowhttp://code.google.com/p/unladen-swallow/`_ fork of CPython 2.6 using LLVM
- `Unladen Swallow Retrospective http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html`_ - `PEP 3146http://python.org/dev/peps/pep-3146/`_
* `psycohttp://psyco.sourceforge.net/`_ (fully Python compliant?), no more maintained
-- Subset of Python to C++ --
* `Nuitkahttp://www.nuitka.net/pages/overview.html`_ * `Python2Chttp://strout.net/info/coding/python/ai/python2c.py`_ * `Shedskinhttp://code.google.com/p/shedskin/`_ * `pythranhttps://github.com/serge-sans-paille/pythran`_ (no class, set, dict, exception, file handling, ...)
-- Subset of Python --
* `pymothoahttp://code.google.com/p/pymothoa/`_: use LLVM; don't support classes nor exceptions. * `unpythonhttp://code.google.com/p/unpython/`_: Python to C * `Perthonhttp://perthon.sourceforge.net/`_: Python to Perl * `Copperheadhttp://copperhead.github.com/`_: Python to GPU (Nvidia)
-- Language very close to Python --
* `Cythonhttp://www.cython.org/`_: "Cython is a programming language based on Python, with extra syntax allowing for optional static type declarations." Based on `Pyrex http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/`_
== See also ==
* `Volunteer developed free-threaded cross platform virtual machines? http://www.boredomandlaziness.org/2012/07/volunteer-supported-free-threaded-...`_
Victor Stinner _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/d.s.seljebotn%40astro.uio....
Victor Stinner, 17.07.2012 20:38:
-- Subset of Python --
* `pymothoa http://code.google.com/p/pymothoa/`_: use LLVM; don't support classes nor exceptions. * `unpython http://code.google.com/p/unpython/`_: Python to C * `Perthon http://perthon.sourceforge.net/`_: Python to Perl * `Copperhead http://copperhead.github.com/`_: Python to GPU (Nvidia)
You might also want to add numexpr and numba to that list. Numba might actually be quite close to pymothoa (hadn't heard of it before). Personally, I like the idea of having a JIT compiler more or less as an extension module at hand. Sort-of like a co-processor, just in software. Lets you run your code either interpreter or JITed, just as you need. Note that the Cython project is working on a protocol to efficiently call external C implemented Python functions by effectively unboxing them. That explicitly includes JIT compiled code, and a JIT compiler could obviously make good use of it from the other side as well. Stefan
Hi,
2012/7/17 Victor Stinner
-- Milestone 3: JIT --
* Depending on the type seen at runtime, recompile the function to generate specialized functions * Use guard to fallback to a generic implementation if the type is not the expected type
From my understanding, psyco did exactly this.
-- Amaury Forgeot d'Arc
Hi Victor. I'm willing to explain to you details why having LLVM does not solve almost any issues and why PyPy is complex, or why you think it's complex. Find me on IRC if you want (fijal, can be found on #pypy on freenode for example). In our opinion something like psyco that gets brought to the levels of speed of pypy would be massively more complex than PyPy, most importantly it would be incredibly fragile. It's possible, but it's lots and lots of work. I don't think it possibly can be done with one person. Speaking about compatible with cpython and yet fast - I would strongly recommend talking to Mark Shannon (the author of HotPy). He's by far the best person who can answer some questions and have a rough plan how to go forward. It would be much better to concentrate efforts rather than write yet another half-finished JIT (because reading code is hard). Cheers, fijal
On Jul 17, 2012, at 11:38 AM, Victor Stinner
IMO PyPy is complex and hard to maintain. PyPy has a design completly different than CPython and is much faster and has a better memory footprint. I don't expect to be as fast as PyPy, just faster than CPython.
I think this criticism is misguided. Let's grant for the moment that you're right, and PyPy is complex and hard to maintain. If a high-level Python parser and JIT compiler written in Python came out as complex and unmaintainable, why do you believe that they'll be easy to write in C? You are correct that it has a different architecture than CPython: it has a different architecture because CPython's architecture is limiting because of its simplicity and makes it difficult to do things like write JIT compilers. The output of the Unladen Swallow project was illuminating in that regard. (Please note I said "output" and not "failure", the Unladen Swallow folks did the community a great service and produced many useful artifacts, even if they didn't meet their original goal.) Polluting the straightforward, portable architecture of CPython with significant machine-specific optimizations to bolt on extra features that are already being worked on elsewhere seems like a waste of effort to me. You could, instead, go work on documenting PyPy's architecture so it seems less arcane to newcomers. Some of the things in there which look like hideous black magic are actually fairly straightforward when explained, as I have learned by being lucky enough to receive explanations in person from Maciej, Benjamin and Alex at various conferences. I mean, don't get me wrong, if this worked out, I'd love a faster CPython, I do still use use many tools which don't support PyPy yet, so I can see the appeal of greater runtime compatibility with CPython than CPyExt offers. I just think that it will end up being a big expenditure of effort for relatively little return. If you disagree, you should feel no need to convince me; just go do it and prove me wrong, which I will be quite happy to be. I would just like to think about whether this is the best use of your energy first. But definitely listen to Maciej's suggestion about concentrating efforts with other people engaged in similar efforts, regardless :). As your original message shows, there has already been enough duplication of effort in this area. -glyph
I would like to write yet another JIT compiler for CPython.
FWIW, so do I.
I did not understand exactly why Unladen Swallow and psyco projects failed, so please tell me if you think that my project is going to fail too!
It may well happen that your project fails, or doesn't even start. Mine didn't start for the last two years (but now may eventually do start). I'm not sure psyco really "failed"; if it did, it was because of PyPy: PyPy was created to do the same stuff as psyco, just better. It was abandoned in favor of PyPy - whether that's a failure of psyco, I don't know. IMO, the psyco implementation itself failed because it was unmaintainable, containing very complicated code that nobody but its authors could understand. Also, I know for a fact that Unladen Swallow (the project) didn't fail; some interesting parts were contributed to Python and are now part of its code base. It's the JIT compiler of Unladen Swallow that "failed"; in my understanding because LLVM is crap (i.e. it is slow, memory-consuming, and buggy) - as a low-level virtual machine; it may be ok as a compiler backend (but I still think it is buggy there as well).
psyco is no more maintained.
I think this is factually incorrect: Christian Tismer maintains it (IIUC).
I would like to use a JIT to generate specialized functions for a combinaison of arguments types.
I think history has moved past specializing JITs. Tracing JITs are the status quo; they provide specialization as a side effect. Regards, Martin
If you disagree, you should feel no need to convince me; just go do it and prove me wrong, which I will be quite happy to be. I would just like to think about whether this is the best use of your energy first.
While I follow most of your reasoning, I think this is a flaw in your logic. This is free software: the only person to decide where energy is best used is the person providing the energy. It may well be that Victor gives up after the first three steps, or it may be that he comes back with a working prototype in August. He may well find that his energy is *best* spent in this project, since it may get him a highly-payed job, a university diploma, or reputation. If not that, he'll learn a lot.
But definitely listen to Maciej's suggestion about concentrating efforts with other people engaged in similar efforts, regardless :).
Again, this thinking is flawed, IMO. It might be in the community's interest if people coordinate, but not in the interest of the individual contributor.
As your original message shows, there has already been enough duplication of effort in this area.
And that's not really a problem, IMO. Regards, Martin
2012/7/18
I would like to write yet another JIT compiler for CPython.
FWIW, so do I.
I don't know whether it's good news (that Martin wants to put his expertise in this area) or a bad sign (that he did not start after so many years of Python development - the problem becomes more and more difficult each time one thinks about it) -- Amaury Forgeot d'Arc
Personally, I like the idea of having a JIT compiler more or less as an extension module at hand. Sort-of like a co-processor, just in software. Lets you run your code either interpreter or JITed, just as you need.
Me too, so something like psyco. LLVM is written in C++ and may have license issue, so I don't really want to add a dependency to LLVM to CPython. For an experimental project, a third party module is also more convinient. Victor
It's the JIT compiler of Unladen Swallow that "failed"; in my understanding because LLVM is crap (i.e. it is slow, memory-consuming, and buggy) - as a low-level virtual machine; it may be ok as a compiler backend (but I still think it is buggy there as well).
What is the status of LLVM nowadays? Is it not a good solution to write a portable JIT? I don't want to write my own library to generate machine code.
psyco is no more maintained.
I think this is factually incorrect: Christian Tismer maintains it (IIUC).
http://psyco.sourceforge.net/ says: "News, 12 March 2012 Psyco is unmaintained and dead. Please look at PyPy for the state-of-the-art in JIT compilers for Python." Victor
On the cpyext front, it would be rather helpful if developers interested in a high speed Python interpreter with good C extension compatibility worked with Dave Malcolm on his static analyser for Python C extensions. One of the reasons cpyext has trouble is that many refcounting bugs in extensions aren't fatal on CPython’s due to additional internal references - a refcount of 1 when it should be 2 is survivable in a way that 0 vs 1 is not. Get rid of that drudgery from hacking on cpyext and it becomes significantly easier to expand the number of extensions that will work across multiple implementations of the API. Cheers, Nick. -- Sent from my phone, thus the relative brevity :)
On Tue, Jul 17, 2012 at 6:20 PM, Victor Stinner
What is the status of LLVM nowadays? Is it not a good solution to write a portable JIT?
I don't want to write my own library to generate machine code.
You don't have to, even if you don't want to use LLVM. There are plenty of "ligher-weight" approaches to that. For example, GNU Lightning [1] or sljit [2]. [1] http://www.gnu.org/software/lightning/ [2] http://sljit.sourceforge.net/ -- Devin
2012/7/18 Nick Coghlan
On the cpyext front, it would be rather helpful if developers interested in a high speed Python interpreter with good C extension compatibility worked with Dave Malcolm on his static analyser for Python C extensions. One of the reasons cpyext has trouble is that many refcounting bugs in extensions aren't fatal on CPython’s due to additional internal references - a refcount of 1 when it should be 2 is survivable in a way that 0 vs 1 is not.
It's not only about bugs. Even when reference counts are correctly managed, cpyext is slow: - each time an object crosses the C|pypy boundary, there is a dict lookup (!) - each time a new object is passed or returned to C, a PyObject structure must be allocated (and sometime much more, specially for strings and types). Py_DECREF will of course free the PyObject, so next time will allocate the object again. - borrowed references are a nightmare.
Get rid of that drudgery from hacking on cpyext and it becomes significantly easier to expand the number of extensions that will work across multiple implementations of the API.
There are also some extension modules that play tricky games with the API; PyQt for example uses metaclasses with a custom tp_alloc slot, to have access to the PyTypeObject structure during the construction of the type... The Python C API is quite complete, but some use cases are still poorly supported. -- Amaury Forgeot d'Arc
On 2012-07-17, at 6:38 PM, Devin Jeanpierre wrote:
On Tue, Jul 17, 2012 at 6:20 PM, Victor Stinner
wrote: What is the status of LLVM nowadays? Is it not a good solution to write a portable JIT?
I don't want to write my own library to generate machine code.
You don't have to, even if you don't want to use LLVM. There are plenty of "ligher-weight" approaches to that. For example, GNU Lightning [1] or sljit [2].
[1] http://www.gnu.org/software/lightning/ [2] http://sljit.sourceforge.net/
And, there is also DynASM [1], [2]. This one was built for LuaJIT and is under MIT licence. [1] http://luajit.org/dynasm.html [2] https://github.com/LuaDist/luajit/tree/master/dynasm - Yury
As your original message shows, there has already been enough duplication of effort in this area.
I didn't find yet a project reusing ceval.c: most projects implement their own eval loop and don't use CPython at all. My idea is not to write something new, but just try to optimize the existing ceval.c code. Pseudo-code: * read the bytecode of a function * replace each bytecode by its "C code" * optimize * compile the "C code" to machine code (I don't know if "C code" is the right expression here, it's just for the example) Dummy example: ---- def mysum(a, b): return a+b ---- Python compiles it to bytecode as: ----
dis.dis(mysum) 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_ADD 7 RETURN_VALUE
The bytecode can be compiled to something like: ---- x = GETLOCAL(0); # "a" if (x == NULL) /* error */ Py_INCREF(x); PUSH(x); x = GETLOCAL(1); # "b" if (x == NULL) /* error */ Py_INCREF(x); PUSH(x); w = POP(); v = TOP(); x = PyNumber_Add(v, w); Py_DECREF(v); Py_DECREF(w); if (x == NULL) /* error */ SET_TOP(x); retval = POP(); return retval; ---- The calls to Py_INCREF() and Py_DEREF() can be removed. The code is no more based on a loop: CPU prefers sequential code. The stack can be replaced variables: the compiler (LLVM?) knows how to replace many variables with a few variables, or even use CPU registers instead. Example: ---- a = GETLOCAL(0); # "a" if (a == NULL) /* error */ b = GETLOCAL(1); # "b" if (b == NULL) /* error */ return PyNumber_Add(a, b); ---- I don't expect to run a program 10x faster, but I would be happy if I can run arbitrary Python code 25% faster. -- Specialization / tracing JIT can be seen as another project, or at least added later. Victor
On 17/07/2012 23:20, Victor Stinner wrote:
http://psyco.sourceforge.net/ says:
"News, 12 March 2012
Psyco is unmaintained and dead. Please look at PyPy for the state-of-the-art in JIT compilers for Python."
Victor
A search on pypi for JIT compilers gives no matches. -- Cheers. Mark Lawrence.
Victor Stinner wrote:
== Other Python VM and compilers ==
As far as I know, these are all still active, although possibly experimental: Pynie (Python for the Parrot VM) WPython (16-bit word-codes instead of byte-codes) HotPy (high-performance optimizing VM for Python) Skulpt (Javascript implementation) HoPe (Python in Haskell) Berp (another Python in Haskell) WPython in particular seems to be very promising, and quite fast. I don't understand why it doesn't get more attention (although I admit I can't criticise, since I haven't installed or used it myself). http://www.pycon.it/media/stuff/slides/beyond-bytecode-a-wordcode-based-pyth... In the Java world, there are byte-code optimizers such as Soot, BLOAT and ProGuard which apparently can speed up Java significantly. As far as I can tell, in the Python world byte-code optimization is a severely neglected area. For good reason? No idea. -- Steven
On 07/17/2012 04:34 PM, Steven D'Aprano wrote:
As far as I know, these are all still active, although possibly experimental:
Pynie (Python for the Parrot VM) WPython (16-bit word-codes instead of byte-codes) [...] WPython in particular seems to be very promising, and quite fast. I don't understand why it doesn't get more attention (although I admit I can't criticise, since I haven't installed or used it myself).
Cesar (sp?) was at Mark's talk on HotPy at EuroPython. We asked him if WPython was still active, and he said, nope, no community interest. IIRC Pynie is basically dead too. I don't know about the others, //arry/
Victor Stinner
Example: ---- a = GETLOCAL(0); # "a" if (a == NULL) /* error */ b = GETLOCAL(1); # "b" if (b == NULL) /* error */ return PyNumber_Add(a, b); ----
I don't expect to run a program 10x faster, but I would be happy if I can run arbitrary Python code 25% faster.
--
Specialization / tracing JIT can be seen as another project, or at least added later.
Victor
This is almost exactly what Unladen Swallow originally did. First, LLVM will not do all of the optimizations you are expecting it to do out of the box. It will still have all the stack accesses, and it will have all of the ref counting operations. You can get a small speed boost from removing the interpretation dispatch overhead, but you also explode your memory usage, and the speedups are tiny. Please, learn from Unladen Swallow and other's experiences, otherwise they're for naught, and frankly we (python-dev) waste a lot of time. Alex
Victor Stinner, 18.07.2012 00:15:
Personally, I like the idea of having a JIT compiler more or less as an extension module at hand. Sort-of like a co-processor, just in software. Lets you run your code either interpreter or JITed, just as you need.
Me too, so something like psyco.
In the sense that it's a third party module, yes. Not in the sense of how it hooks into the runtime. The intention would be that users explicitly run their code in a JIT compiled environment, e.g. their template processing or math code. The runtime wouldn't switch to a JIT compiler automatically for "normal" code. I mean, that could still become a feature at some point, but I find a decorator or an exec-like interface quite acceptable, as long as it fails loudly with "can't do that" if the JIT compiler doesn't support a specific language feature. Stefan
Alex Gaynor, 18.07.2012 03:24:
Victor Stinner writes:
Example: ---- a = GETLOCAL(0); # "a" if (a == NULL) /* error */ b = GETLOCAL(1); # "b" if (b == NULL) /* error */ return PyNumber_Add(a, b); ----
I don't expect to run a program 10x faster, but I would be happy if I can run arbitrary Python code 25% faster.
--
Specialization / tracing JIT can be seen as another project, or at least added later.
This is almost exactly what Unladen Swallow originally did. First, LLVM will not do all of the optimizations you are expecting it to do out of the box. It will still have all the stack accesses, and it will have all of the ref counting operations. You can get a small speed boost from removing the interpretation dispatch overhead, but you also explode your memory usage, and the speedups are tiny.
My experience with Cython tells me that even if you move the entire interpretation overhead out of the way, you'd only get some 5-20% speedup for real code, rarely more if you have some really tight loops. Adding a full-blown JIT compiler to the dependencies just for that is usually not worth it, and Unladen Swallow succeeded in showing that pretty clearly. It's when you start specialising and optimising code patterns that it becomes really interesting, but you can do that statically at build time or compile time in most cases (at least in the more interesting ones) and Cython is one way to do it. Again, no need to add a JIT compiler. The nice thing about JIT compilers is that you can give them your code and they'll try to optimise it for you without further interaction. That doesn't mean you get the fastest code ever, it just means that they do all the profiling for you and try to figure it out all by themselves. That may or may not work out, but it usually works quite ok (and you'll love JIT compilers for it) and only rarely gets seriously in the way (and that's when you'll hate JIT compilers). However, it requires that the JIT compiler knows about a lot of optimisations. PyPy's JIT is full of those. It's not the fact that it has a JIT compiler at all that makes it fast and not the fact that they compile Python to machine code, it's the fact that they came up with a huge bunch of specialisations that makes lots of code patterns fast once it detected them. LLVM (or any other low-level JIT compiler) won't help at all with that. Stefan
2012/7/18 Victor Stinner
I don't expect to run a program 10x faster, but I would be happy if I can run arbitrary Python code 25% faster.
If that's your target, you don't need to resort to a bytecode-to-binary-equivalent compiler. WPython already gave similar results with Python 2.6. The idea behind is that using an hybrid stack-register VM, you'll spend less time on the ceval loop "constant stuff" (checking for events / GIL release, etc.). That's because superinstructions aggregates more bytecodes into a single "wordcode", which requires only one decoding phase, avoids many pushes/pops, and some unnecessary inc/decr reference counting. A better peephole optimizer is provided, and some other optimizations as well. There's also room for more optimizations. I have many ideas to improve both WPython or just the ceval loop. For example, at the last EuroPython sprint I was working to a ceval optimization that gave about 10% speed improvement for the CPython 3.3 beta trunk (on my old MacBook Air, running 32-bit Windows 8 preview), but still needs to be checked for correctness (I'm spending much more time running and checking the standard tests than for its implementation ;-) In the end, I think that a lot can be done to improve the good old CPython VM, without resorting to a JIT compiler. Lack of time is the enemy... Regards, Cesare
--
Specialization / tracing JIT can be seen as another project, or at least added later.
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/cesare.di.mauro%40gmail.co...
What is the status of LLVM nowadays? Is it not a good solution to write a portable JIT?
I don't think it is. It is still slow and memory hungry. The fact that the version that Apple ships with Xcode still miscompiles Python 3.3 tells me that it is still buggy.
I don't want to write my own library to generate machine code.
I plan to use nanojit. Regards, Martin
Zitat von Mark Lawrence
On 17/07/2012 23:20, Victor Stinner wrote:
http://psyco.sourceforge.net/ says:
"News, 12 March 2012
Psyco is unmaintained and dead. Please look at PyPy for the state-of-the-art in JIT compilers for Python."
Victor
A search on pypi for JIT compilers gives no matches.
I think you misread: PyPy, not pypi. Regards, Martin
2012/7/18 Steven D'Aprano
WPython in particular seems to be very promising, and quite fast. I don't understand why it doesn't get more attention (although I admit I can't criticise, since I haven't installed or used it myself).
http://www.pycon.it/media/stuff/slides/beyond-bytecode-a-wordcode-based-pyth...
Yes, that was the reason that brought me to stop the project: lack of interest from python community. But at the last EuroPython I had the opportunity to talk to Guido, so I think that I can try to port WPython (and check for some ideas). However the fault with WPython was mine: it wasn't a simple patch, so it was very difficult to review. My bad. In the Java world, there are byte-code optimizers such as Soot, BLOAT and
ProGuard which apparently can speed up Java significantly. As far as I can tell, in the Python world byte-code optimization is a severely neglected area. For good reason? No idea.
-- Steven
I think that Python case is different. You can't spend so many times on optimizing the generating code, because usually the code is compiled at the execution time. Startup time is an issue for Python, which is influenced so much by the source-to-bytecode compilation. Java is statically compiled, and then executed. So you can think about using better optimizers until the code will finally run. Regards, Cesare
martin@v.loewis.de, 18.07.2012 07:53:
[Victor Stinner]
I don't want to write my own library to generate machine code.
I plan to use nanojit.
As I said, generating machine code is the uninteresting part of it and won't give you much of a win. The changes you make to the way the code works while walking the path from seeing the Python code to generating machine code is what makes the difference. You could even skip the machine code generation all together and just go with triggering pre-implemented high-level patterns from the interpreter. If you code up the right patterns, your win would be bigger than with a bare 1:1 mapping from Python code to machine code. Both Cython and WPython are clear examples for that. Stefan
On 18/07/2012 06:55, martin@v.loewis.de wrote:
Zitat von Mark Lawrence
: On 17/07/2012 23:20, Victor Stinner wrote:
http://psyco.sourceforge.net/ says:
"News, 12 March 2012
Psyco is unmaintained and dead. Please look at PyPy for the state-of-the-art in JIT compilers for Python."
Victor
A search on pypi for JIT compilers gives no matches.
I think you misread: PyPy, not pypi.
Regards, Martin
No think about it I did misread, my apologies for time wasting :-( -- Cheers. Mark Lawrence.
In article <20120718075314.Horde.ty5wC7uWis5QBk9Krz1hyUA@webmail.df.eu>, martin@v.loewis.de wrote:
I don't think it is. It is still slow and memory hungry. The fact that the version that Apple ships with Xcode still miscompiles Python 3.3 tells me that it is still buggy.
Whether LLVM is suitable for a JIT is an interesting question but it's not LLVM per se that is the problem with compiling 3.3. Apple ships two C compiler chains with Xcode 4 for OS X 10.7, both of them are based on LLVM. It's the Apple transitional gcc-4.2 frontend with an old LLVM backend that is problematic (and not to be confused with the "pure" gcc-4.2 shipped with Xcode 3). That compiler was the default compiler for early releases of Xcode 4 and for building OS X 10.7. It has been frozen for a long time because Apple's efforts have been going into transitioning the OS X world to the new compiler: a clang frontend with a more current LLVM backend. The latest releases of Xcode 4 now use clang-llvm as the default and that's what we now use as a default for building Python 3.3 with Xcode 4. That transition will be complete with the imminent release of OS X 10.8 Mountain Lion when the whole OS is built with clang-llvm. The iOS world is already there. -- Ned Deily, nad@acm.org
However, it requires that the JIT compiler knows about a lot of optimisations. PyPy's JIT is full of those. It's not the fact that it has a JIT compiler at all that makes it fast and not the fact that they compile Python to machine code, it's the fact that they came up with a huge bunch of specialisations that makes lots of code patterns fast once it detected them. LLVM (or any other low-level JIT compiler) won't help at all with that.
Stefan
Very good point Stefan I would just like to add that a lot of those also require changes in the object model which might render changes in CPython C API (like the introduction of maps). Certainly you can't keep the current C structures, which will already break some code. Cheers, fijal
On Wed, Jul 18, 2012 at 8:27 AM, Stefan Behnel
martin@v.loewis.de, 18.07.2012 07:53:
[Victor Stinner]
I don't want to write my own library to generate machine code.
I plan to use nanojit.
As I said, generating machine code is the uninteresting part of it and won't give you much of a win. The changes you make to the way the code works while walking the path from seeing the Python code to generating machine code is what makes the difference.
You could even skip the machine code generation all together and just go with triggering pre-implemented high-level patterns from the interpreter. If you code up the right patterns, your win would be bigger than with a bare 1:1 mapping from Python code to machine code. Both Cython and WPython are clear examples for that.
Stefan
It's uninteresting but it's completely necessary and it's still quite a bit of work. For the PyPy needs llvm failed to provide some features (besides being buggy), like dynamic patching of compiled assembler (you kind of need that for the tracing JIT when you discover new paths) or speed of execution. Cheers, fijal
Some of my (reasonably well informed) opinions on this subject... The theory ---------- Don't think in terms of speeding up your program. Think in terms of reducing the time spent executing your program. Performance is improved by removing aspects of the execution overhead. In a talk I gave at EuroPython 2010(1), I divided the overhead into 5 parts: Interpretive Imprecise type information Parameter Handling & Call overhead Lookups (globals/builtins/attributes) Memory management (garbage collection) For optimising CPython, we cannot change the GC from ref-counting, but the other 4 apply, plus boxing and unboxing of floats and ints. Compilation (by which I assume people mean converting bytecodes to machine code) addresses only the first point (by definition). I worry that Victor is proposing to make the same mistake made by Unladen Swallow, which is to attack the interpretive overhead first, then attack the other overheads. This is the wrong way around. If you want good performance, JIT compilation should come last not first. Results from my PhD thesis(2) show that the original HotPy without any JIT compilation outperformed Unladen Swallow using JIT compilation. In other words, an optimising interpreter for Python will be faster than a naive JIT compiler. The optimised bytecode traces in an optimising interpreter are much better input for a JIT compiler than the original bytecodes. The practice ------------ If you want modest speedup for modest effort, then look at Cesare's WPython. Also take a look at Stefan Brunthaler's work on inline caching in an interpreter. If you want a larger speedup then you need to tackle most or all of the causes of execution overhead listed above. HotPy (version 2, a fork of CPython) aims to tackle all of these causes except the GC overhead. As far as I am aware, it is the only project that does so. Please take a look at www.hotpy.org for more information on HotPy. You can see my talk from EuroPython 2011(3) on the ideas behind it and from EuroPython 2012(4) on the current implementation. Finally, a defence of LLVM. LLVM is a quality piece of software. It may have some bugs, so does all software. The code-generation components are designed with static compilation in mind, so they do use a lot of memory and run slowly for a JIT compiler, but they produce excellent quality code. And don't forget the old saying about blaming your tools ;) If HotPy (version 2) were to have an (optional) JIT I would expect it to be LLVM based. The JIT can run in a separate thread, while the optimised code continues to run in the interpreter, patching in the machine code when it is complete. Cheers, Mark. 1) Talk at EuroPython 2010 Slides: www.dcs.gla.ac.uk/~marks/comparison.pdf Video: http://blip.tv/europythonvideos/mark_shannon-_hotpy_a_comparison-3999872 The information in the talk is a bit out of date; PyPy now includes out-of-line guards. 2) theses.gla.ac.uk/2975/01/2011shannonphd.pdf 3) Talk at EuroPython 2011 https://ep2012.europython.eu/conference/talks/making-cpython-fast-using-trac... 4) Talk at EuroPythnon 2012 https://ep2012.europython.eu/conference/talks/hotpy-2-a-high-performance-bin...
On Wed, Jul 18, 2012 at 7:45 PM, Mark Shannon wrote:
The practice ------------
If you want modest speedup for modest effort, then look at Cesare's WPython. Also take a look at Stefan Brunthaler's work on inline caching in an interpreter.
If you want a larger speedup then you need to tackle most or all of the causes of execution overhead listed above. HotPy (version 2, a fork of CPython) aims to tackle all of these causes except the GC overhead. As far as I am aware, it is the only project that does so.
Indeed, there's a lot that could be done in the CPython compiler and eval loop, and the bytecode design used to communicate between them. Thanks for summarising that so clearly. There are a couple of other compiler related patches that anyone interested in optimisation of CPython should at least glance at: - Dave Malcolm's patch that adds a Python level AST optimisation step to the compiler (effectively trading slower compile times for faster execution of the compiled bytecode) (http://bugs.python.org/issue10399) - Eugene Toder's patch to add an AST optimisation step to the compiler chain (http://bugs.python.org/issue11549) (I've asked Eugene about this patch more recently and his current thought is that subsequent improvements to the peephole optimisation have rendered it less valuable. However, the patch is still a potentially useful resource for anyone investigating bytecode optimisation ideas) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 18 Jul, 2012, at 7:53, martin@v.loewis.de wrote:
What is the status of LLVM nowadays? Is it not a good solution to write a portable JIT?
I don't think it is. It is still slow and memory hungry. The fact that the version that Apple ships with Xcode still miscompiles Python 3.3 tells me that it is still buggy.
Does it miscompile? I regularly run the 3.3 testsuite using the latest Xcode from the Appstore on a OSX Lion machine and that works properly. The only unexpected test failures are in ctypes, which is probably caused by the way the clang developers interpret the ABI w.r.t. passing small values. Ronald
Zitat von Ronald Oussoren
On 18 Jul, 2012, at 7:53, martin@v.loewis.de wrote:
What is the status of LLVM nowadays? Is it not a good solution to write a portable JIT?
I don't think it is. It is still slow and memory hungry. The fact that the version that Apple ships with Xcode still miscompiles Python 3.3 tells me that it is still buggy.
Does it miscompile?
I'm talking about the bug in http://mail.python.org/pipermail/python-dev/2011-September/113731.html
I regularly run the 3.3 testsuite using the latest Xcode from the Appstore on a OSX Lion machine and that works properly.
I'm not actually using the latest Xcode. So if you could test my test program, that would be much appreciated. Regards, Martin
On 18 Jul, 2012, at 16:59, martin@v.loewis.de wrote:
Zitat von Ronald Oussoren
: On 18 Jul, 2012, at 7:53, martin@v.loewis.de wrote:
What is the status of LLVM nowadays? Is it not a good solution to write a portable JIT?
I don't think it is. It is still slow and memory hungry. The fact that the version that Apple ships with Xcode still miscompiles Python 3.3 tells me that it is still buggy.
Does it miscompile?
I'm talking about the bug in
http://mail.python.org/pipermail/python-dev/2011-September/113731.html
I regularly run the 3.3 testsuite using the latest Xcode from the Appstore on a OSX Lion machine and that works properly.
I'm not actually using the latest Xcode. So if you could test my test program, that would be much appreciated.
That bug in llvm-gcc still exists, and is unlikely to get fixed. That's a bug in the integretion of the GCC frontend and LLVM backend, clang (LLVM project frontend + LLVM backend) does work. Ronald
On Wed, 18 Jul 2012 17:15:18 +0200
Ronald Oussoren
I regularly run the 3.3 testsuite using the latest Xcode from the Appstore on a OSX Lion machine and that works properly.
I'm not actually using the latest Xcode. So if you could test my test program, that would be much appreciated.
That bug in llvm-gcc still exists, and is unlikely to get fixed. That's a bug in the integretion of the GCC frontend and LLVM backend, clang (LLVM project frontend + LLVM backend) does work.
Not only clang seems to work, but we have a stable buildbot running on it: http://buildbot.python.org/all/buildslaves/langa-lion Regards Antoine.
On 17 Jul 2012, at 23:04, martin@v.loewis.de wrote:
[snip...]
I would like to use a JIT to generate specialized functions for a combinaison of arguments types.
I think history has moved past specializing JITs. Tracing JITs are the status quo; they provide specialization as a side effect.
Mozilla implemented a method-JIT (compile whole methods) on top of their tracing JIT because a tracing JIT only optimises part of your code (only in loops and only if executed more times than the threshold) and there are further performance improvements to be had. So tracing JITs are not the *whole* of the state of the art. Michael
Regards, Martin
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
That's not, strictly speaking, true. Mozilla added a method-JIT (Jaegermonkey) and then added another one (IonMonkey) because their tracing JIT (Tracemonkey) was bad. There's no fundamental reason that tracing has to only cover loops, indeed PyPy's tracing has been generalized to compile individual functions, recursion, etc. And any profiling JIT, in practice, needs a compile heuristic for how many calls must occur before a unit is compiled, even the Hotspot JVM has one. Alex
On Tue, Jul 17, 2012 at 3:20 PM, Victor Stinner
It's the JIT compiler of Unladen Swallow that "failed"; in my understanding because LLVM is crap (i.e. it is slow, memory-consuming, and buggy) - as a low-level virtual machine; it may be ok as a compiler backend (but I still think it is buggy there as well).
What is the status of LLVM nowadays? Is it not a good solution to write a portable JIT?
Its code generator is still fairly slow. You could probably get a faster one committed, but you'd have to write it. LLVM also still doesn't have great profile-guided optimizations (what you need in a JIT), although the infrastructure is starting to be built. You'd probably have to contribute to that too. It's probably a better use of your time to contribute to PyPy. Jeffrey
On Fri, Jul 20, 2012 at 2:55 PM, Michael Foord
On 17 Jul 2012, at 23:04, martin@v.loewis.de wrote:
[snip...]
I would like to use a JIT to generate specialized functions for a combinaison of arguments types.
I think history has moved past specializing JITs. Tracing JITs are the status quo; they provide specialization as a side effect.
Mozilla implemented a method-JIT (compile whole methods) on top of their tracing JIT because a tracing JIT only optimises part of your code (only in loops and only if executed more times than the threshold) and there are further performance improvements to be had. So tracing JITs are not the *whole* of the state of the art.
Michael
I'm sorry michael but you're like a 100th person I have to explain this to. The pure reason that mozilla did not make a tracing JIT work does not mean the entire approach is horribly doomed as many people would like to assume. The reasons are multiple, but a lot of them are connected to poor engineering (for example the part inherited from adobe is notoriously bad, have a look if you want). Cheers, fijal
On Jul 18, 2012, at 3:30 AM, Nick Coghlan wrote:
- Eugene Toder's patch to add an AST optimisation step to the compiler chain (http://bugs.python.org/issue11549) (I've asked Eugene about this patch more recently and his current thought is that subsequent improvements to the peephole optimisation have rendered it less valuable. However, the patch is still a potentially useful resource for anyone investigating bytecode optimisation ideas)
+1 for furthering Eugene's patch. The AST is the correct place to do some of the optimizations currently done by the peepholer. Raymond
On 20 Jul 2012, at 17:50, Maciej Fijalkowski wrote:
On Fri, Jul 20, 2012 at 2:55 PM, Michael Foord
wrote: On 17 Jul 2012, at 23:04, martin@v.loewis.de wrote:
[snip...]
I would like to use a JIT to generate specialized functions for a combinaison of arguments types.
I think history has moved past specializing JITs. Tracing JITs are the status quo; they provide specialization as a side effect.
Mozilla implemented a method-JIT (compile whole methods) on top of their tracing JIT because a tracing JIT only optimises part of your code (only in loops and only if executed more times than the threshold) and there are further performance improvements to be had. So tracing JITs are not the *whole* of the state of the art.
Michael
I'm sorry michael but you're like a 100th person I have to explain this to. The pure reason that mozilla did not make a tracing JIT work does not mean the entire approach is horribly doomed as many people would like to assume. The reasons are multiple, but a lot of them are connected to poor engineering (for example the part inherited from adobe is notoriously bad, have a look if you want).
Well, that isn't how they describe it. If it is the case, it's *still* interesting that rather than putting their efforts into improving the tracing JIT they put them into adding a method-JIT *as well*. Also note that where I said "tracing JITs are not the whole of the state of the art" you somehow managed to translate this into "the entire approach is horribly doomed". That seems an ungenerous reading of what I wrote... :-) Michael
Cheers, fijal
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
On Sat, 21 Jul 2012 11:45:21 +0100
Michael Foord
On 20 Jul 2012, at 17:50, Maciej Fijalkowski wrote:
On Fri, Jul 20, 2012 at 2:55 PM, Michael Foord
wrote: On 17 Jul 2012, at 23:04, martin@v.loewis.de wrote:
[snip...]
I would like to use a JIT to generate specialized functions for a combinaison of arguments types.
I think history has moved past specializing JITs. Tracing JITs are the status quo; they provide specialization as a side effect.
Mozilla implemented a method-JIT (compile whole methods) on top of their tracing JIT because a tracing JIT only optimises part of your code (only in loops and only if executed more times than the threshold) and there are further performance improvements to be had. So tracing JITs are not the *whole* of the state of the art.
Michael
I'm sorry michael but you're like a 100th person I have to explain this to. The pure reason that mozilla did not make a tracing JIT work does not mean the entire approach is horribly doomed as many people would like to assume. The reasons are multiple, but a lot of them are connected to poor engineering (for example the part inherited from adobe is notoriously bad, have a look if you want).
Well, that isn't how they describe it. If it is the case, it's *still* interesting that rather than putting their efforts into improving the tracing JIT they put them into adding a method-JIT *as well*.
Honestly I'm not sure that's a very interesting discussion. First, Javascript performance is not based on the same prorities as Python performance: for the former, startup time is key. Second, whether method-based or tracing-based, a well-written JIT would certainly bring significant performance improvements over a bytecode interpreter anyway. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net
participants (22)
-
Alex Gaynor
-
Amaury Forgeot d'Arc
-
Antoine Pitrou
-
Cesare Di Mauro
-
Dag Sverre Seljebotn
-
Devin Jeanpierre
-
Glyph
-
Jeffrey Yasskin
-
Larry Hastings
-
Maciej Fijalkowski
-
Mark Lawrence
-
Mark Shannon
-
martin@v.loewis.de
-
Michael Foord
-
Ned Deily
-
Nick Coghlan
-
Raymond Hettinger
-
Ronald Oussoren
-
Stefan Behnel
-
Steven D'Aprano
-
Victor Stinner
-
Yury Selivanov