On Mon, Jan 31, 2011 at 10:54 PM, Antoine Pitrou <solipsis(a)pitrou.net> wrote:
> On Mon, 31 Jan 2011 20:45:45 +0000
> techtonik(a)gmail.com wrote:
>> I see no reason for b.p.o bureaucracy. Mercurial-style workflow  is
>> more beneficial to development as it doesn't require switching from
>> console to browser for submitting changes.
> Ok, why don't you contribute to Mercurial instead?
If you don't want to receive a stupid answer, why don't you read the
link and say what you don't like in this approach in a constructive
I just did it: my first python source code hack.
I replaced the NEXTARG and PEEKARG macros in ceval.c using a cast to
short pointer, and lo and behold, a crude measurement indicates one
to two percent speed increase.
That isn't much, but it is virtually for free!
Here are the macro's I used:
#define NEXTARG() (next_instr +=2, *(short*)&next_instr[-2])
#define PEEKARG() (*(short*)&next_instr)
I tried to find any research on this subject, but I couldn't find any,
so I'll be daring and vulnerable and just try it out to see what your
I single stepped a simple loop in Python to see where the efficiency
I was impressed by the optimizations already in there, but I still
dare to suggest an optimization that from my estimates might shave
off a few cycles, speeding up Python about 5%.
The idea is simple: change the byte code argument values from two
bytes to one.
- code changes are relatively simple, see below
- fewer memory reads, which are becoming more and more expensive
- saves three instructions for every opcode with args (i.e. most of
Code changes are, as far as I could find:
assemble_emit must produce extended opcodes
for all cases of more than 8 bits instead of 16
NEXTARG and PEEKARG need adjustment
EXTENDED_ARG needs adjustment
(this will be a four byte instruction, which is ugly, I agree)
GETARG, SETARG, need adjustment
also GETJUMPTGT, CODESIZE
routine tuple_of_constants, fold_binops_on_constants, PyCode_Optimize
are dependent on instruction length, which will be 2 instead of 3
(search for the digit 3 will find all cases, as far as I checked)
you probably will have to write a macro for codestr[i+3]
there is a check for code length >32700, but I think this one might
maybe if a few extra checks are added.
Estimation of speed impact:
about 80% of the instructions seem to have an argument, and I never
saw an opcode >255 while looking at bytecode, so they are probably
The NEXTARG macro expands on my Macbook to:
mov -408(%ebp),%edx (next_instr)
movzbl 2(%edx),%eax (*second byte)
shl $0x8,%eax (*shift)
movzbl 1(%edx),%edx (first byte)
add %edx,%eax (*combine)
and the starred instructions will vanish.
The main loop is approximately 40 instructions, so a saving of three
instructions is significant. I don't dare to claim 3/40 = 7.5% savings,
but I think 5% may be realistic.
Did anyone try this already? If not, I might take up the gauntlet
and try it myself, but I never did this before...
PS I also saw that some scratch variables, mainly v and x, are
carefull stored back in memory by the compiler and the end of the big
interpreter loop, while their value isn't used anymore, of course.
A few carefully placed braces might tell the compiler how useless
this is and
save another few percent.
> What version of CPython did you try that with? The latest py3k branch?
I had a quick look at 3.2, 2.5 and 2.7 and got the impression that
the savings is more if the interpreter loop is faster: the fewer
instructions there are, the bigger a 3 instruction difference would
The NEXTARG macro is the same in all three versions:
#define NEXTARG() (next_instr += 2, (next_instr[-1]<<8) +
and the compiler compiles this to two separate fetches.
I found out my compiler (gcc) will make better code if we used a short.
It produces a "movswl" instruction to do both fetches at the same
time, if I force it to.
That saves two instructions already.
This would imply that on little-endian machines, this would already
save a few percent changing just 1 line of code in ceval.c:
#define NEXTARG() (next_instr += 2, *(short *)&next_instr[-2])
-----BEGIN PGP SIGNED MESSAGE-----
On behalf of the Python development team, I'm quite happy to announce
the second release candidate of Python 3.2.
Python 3.2 is a continuation of the efforts to improve and stabilize the
Python 3.x line. Since the final release of Python 2.7, the 2.x line
will only receive bugfixes, and new features are developed for 3.x only.
Since PEP 3003, the Moratorium on Language Changes, is in effect, there
are no changes in Python's syntax and built-in types in Python 3.2.
Development efforts concentrated on the standard library and support for
porting code to Python 3. Highlights are:
* numerous improvements to the unittest module
* PEP 3147, support for .pyc repository directories
* PEP 3149, support for version tagged dynamic libraries
* PEP 3148, a new futures library for concurrent programming
* PEP 384, a stable ABI for extension modules
* PEP 391, dictionary-based logging configuration
* an overhauled GIL implementation that reduces contention
* an extended email package that handles bytes messages
* a much improved ssl module with support for SSL contexts and certificate
* a sysconfig module to access configuration information
* additions to the shutil module, among them archive file support
* many enhancements to configparser, among them mapping protocol support
* improvements to pdb, the Python debugger
* countless fixes regarding bytes/string issues; among them full support
for a bytes environment (filenames, environment variables)
* many consistency and behavior fixes for numeric operations
For a more extensive list of changes in 3.2, see
To download Python 3.2 visit:
Please consider trying Python 3.2 with your code and reporting any bugs
you may notice to:
Georg Brandl, Release Manager
georg at python.org
(on behalf of the entire python-dev team and 3.2's contributors)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
-----END PGP SIGNATURE-----