Hi, On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson <benjamin@python.org> wrote:
2011/11/8 stefan brunthaler <s.brunthaler@uci.edu>:
How does that sound?
I think I can hear real patches and benchmarks most clearly.
I spent the better part of my -20% time on implementing the work as "suggested". Please find the benchmarks attached to this email, I just did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched off the regular 3.3a0 default tip changeset 73977 shortly after your email. I do not have an official patch yet, but am going to create one if wanted. Changes to the existing interpreter are minimal, the biggest chunk is a new interpreter dispatch loop. Merging dispatch loops eliminates some of my optimizations, but my inline caching technique enables inlining some functionality, which results in visible speedups. The code is normalized to the non-threaded-code version of the CPython interpreter (named "vanilla"), so that I can reference it to my preceding results. I anticipate *no* compatibility issues and the interpreter requires less than 100 KiB of extra memory at run-time. Since my interpreter is using 215 of a maximum of 255 instructions, there is room for adding additional derivatives, e.g., for popular Python libraries, too. Let me know what python-dev thinks of this and have a nice weekend, --stefan PS: AFAIR the version without partial stack frame caching also passes all regression tests modulo the ones that test against specific bytecodes.