I accidentally left out the telco benchmark, which is bad since cdecimal makes it just scream on Python 3.3 (and I verified with Python 3.2 that this is an actual speedup and not some silly screw-up like I initially had with spectral_norm):

### telco ###
Min: 0.897108 -> 0.016880: 53.15x faster
Avg: 0.899742 -> 0.017443: 51.58x faster
Significant (t=692.55)
Stddev: 0.00283 -> 0.00032: 8.8470x smaller

On Sun, Sep 30, 2012 at 7:12 PM, Brett Cannon <brett@python.org> wrote:
I am presenting the talk "Python 3.3: Trust Me, It's Better Than 2.7" as PyCon Argentina and Brasil (and US if they accept the talk). As part of that talk I need to be able to benchmark Python 3.3 against 2.7 (both from tip) using the unladen benchmarks (which now include benchmarks from PyPy that can be relatively easily ported to Python 3).

To make sure the unladen benchmarks run fine against Python 3.3, I did a fast run of the benchmarks. I figured people might be interested in the quick-and-dirty results on my 2 GHz Intel Core i7 MacBook Pro w/ 8 GB RAM and no attempt to control for performance beyond not actively browsing the web. As I said, quick-and-dirty and not authoritative; all done just to make sure all the benchmarks could run to completion (including the django, html5lib, and genshi benchmarks which are only on my laptop ATM until those projects cut a release with official Python 3 support).

One thing to keep in mind is that many benchmarks use a raw str for things, so the benchmarks often compare Python 2.7 str vs. Python 3.3 str (i.e. str vs. unicode). While this might seem unfair, this is what real-world comparisons in performance will be from users so it's an (somewhat unfair) comparison that we just have to live with. I might take the time to try to make some tests run under both raw strings and unicode so both comparisons are available.

If you care about helping out with the benchmarks (e.g. helping spot where the iteration counts should be higher, etc.) then head over to the speed@ mailing list.



> python3 perf.py -T --basedir ../benchmarks -f -b py3k ../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3 

... output about the command line for the benchmarks ...

### 2to3 ###
0.785234 -> 0.722169: 1.09x faster

### call_method ###
Min: 0.491433 -> 0.414841: 1.18x faster
Avg: 0.493640 -> 0.416564: 1.19x faster
Significant (t=127.21)
Stddev: 0.00170 -> 0.00162: 1.0513x smaller

### call_method_slots ###
Min: 0.492749 -> 0.416280: 1.18x faster
Avg: 0.497888 -> 0.419275: 1.19x faster
Significant (t=61.72)
Stddev: 0.00433 -> 0.00237: 1.8304x smaller

### call_method_unknown ###
Min: 0.575536 -> 0.427234: 1.35x faster
Avg: 0.577286 -> 0.433428: 1.33x faster
Significant (t=66.09)
Stddev: 0.00117 -> 0.00835: 7.1621x larger

### call_simple ###
Min: 0.413011 -> 0.338923: 1.22x faster
Avg: 0.415862 -> 0.340699: 1.22x faster
Significant (t=111.94)
Stddev: 0.00223 -> 0.00134: 1.6616x smaller

### chaos ###
Min: 0.375286 -> 0.435456: 1.16x slower
Avg: 0.382798 -> 0.459515: 1.20x slower
Significant (t=-5.01)
Stddev: 0.01116 -> 0.03234: 2.8980x larger

### fastpickle ###
Min: 0.853560 -> 0.770580: 1.11x faster
Avg: 0.879498 -> 0.776249: 1.13x faster
Significant (t=8.24)
Stddev: 0.02771 -> 0.00407: 6.7995x smaller

### float ###
Min: 0.476596 -> 0.391101: 1.22x faster
Avg: 0.486164 -> 0.411553: 1.18x faster
Significant (t=9.07)
Stddev: 0.01049 -> 0.01511: 1.4411x larger

### formatted_logging ###
Min: 0.346703 -> 0.451643: 1.30x slower
Avg: 0.351218 -> 0.454626: 1.29x slower
Significant (t=-51.50)
Stddev: 0.00376 -> 0.00246: 1.5265x smaller

### genshi ###
Min: 0.275107 -> 0.294309: 1.07x slower
Avg: 0.287433 -> 0.299026: 1.04x slower
Significant (t=-3.82)
Stddev: 0.01077 -> 0.00467: 2.3044x smaller

### go ###
Min: 0.719160 -> 0.781042: 1.09x slower
Avg: 0.729322 -> 0.798135: 1.09x slower
Significant (t=-8.54)
Stddev: 0.01300 -> 0.01248: 1.0415x smaller

### hexiom2 ###
203.842661 -> 187.107363: 1.09x faster

### iterative_count ###
Min: 0.145088 -> 0.153285: 1.06x slower
Avg: 0.146369 -> 0.154425: 1.06x slower
Significant (t=-9.21)
Stddev: 0.00134 -> 0.00142: 1.0569x larger

### json_dump_v2 ###
Min: 3.512367 -> 4.040813: 1.15x slower
Avg: 3.521879 -> 4.057966: 1.15x slower
Significant (t=-64.29)
Stddev: 0.01071 -> 0.01526: 1.4247x larger

### json_load ###
Min: 1.024560 -> 0.642353: 1.60x faster
Avg: 1.025255 -> 0.644000: 1.59x faster
Significant (t=426.59)
Stddev: 0.00049 -> 0.00194: 3.9240x larger

### mako_v2 ###
Min: 0.137584 -> 0.287701: 2.09x slower
Avg: 0.140620 -> 0.293204: 2.09x slower
Significant (t=-296.14)
Stddev: 0.00243 -> 0.00272: 1.1195x larger

### meteor_contest ###
Min: 0.284739 -> 0.254285: 1.12x faster
Avg: 0.286174 -> 0.255323: 1.12x faster
Significant (t=38.02)
Stddev: 0.00124 -> 0.00133: 1.0725x larger

### nbody ###
Min: 0.491416 -> 0.336127: 1.46x faster
Avg: 0.493339 -> 0.337467: 1.46x faster
Significant (t=185.50)
Stddev: 0.00164 -> 0.00092: 1.7927x smaller

### normal_startup ###
Min: 0.639285 -> 0.898157: 1.40x slower
Avg: 0.645513 -> 0.901586: 1.40x slower
Significant (t=-90.10)
Stddev: 0.00575 -> 0.00270: 2.1309x smaller

### nqueens ###
Min: 0.399351 -> 0.429575: 1.08x slower
Avg: 0.403643 -> 0.430284: 1.07x slower
Significant (t=-9.83)
Stddev: 0.00603 -> 0.00053: 11.3092x smaller

### pathlib ###
Min: 0.137462 -> 0.170506: 1.24x slower
Avg: 0.145370 -> 0.172849: 1.19x slower
Significant (t=-11.09)
Stddev: 0.01232 -> 0.00128: 9.6403x smaller

### pidigits ###
Min: 0.400265 -> 0.379307: 1.06x faster
Avg: 0.401755 -> 0.381171: 1.05x faster
Significant (t=14.65)
Stddev: 0.00259 -> 0.00178: 1.4496x smaller

### raytrace ###
Min: 1.770596 -> 1.958350: 1.11x slower
Avg: 1.773719 -> 1.968401: 1.11x slower
Significant (t=-44.19)
Stddev: 0.00439 -> 0.00882: 2.0099x larger

### regex_effbot ###
Min: 0.076566 -> 0.098124: 1.28x slower
Avg: 0.077491 -> 0.098696: 1.27x slower
Significant (t=-54.47)
Stddev: 0.00052 -> 0.00069: 1.3227x larger

### regex_v8 ###
Min: 0.091530 -> 0.109116: 1.19x slower
Avg: 0.092308 -> 0.113627: 1.23x slower
Significant (t=-5.72)
Stddev: 0.00088 -> 0.00829: 9.4271x larger

### richards ###
Min: 0.257974 -> 0.232134: 1.11x faster
Avg: 0.259248 -> 0.234325: 1.11x faster
Significant (t=23.80)
Stddev: 0.00144 -> 0.00185: 1.2823x larger

### simple_logging ###
Min: 0.326569 -> 0.416797: 1.28x slower
Avg: 0.331694 -> 0.418844: 1.26x slower
Significant (t=-36.32)
Stddev: 0.00523 -> 0.00122: 4.3004x smaller

### spectral_norm ###
Min: 0.483011 -> 0.741558: 1.54x slower
Avg: 0.487128 -> 0.749741: 1.54x slower
Significant (t=-57.40)
Stddev: 0.00512 -> 0.00886: 1.7299x larger

### startup_nosite ###
Min: 0.220444 -> 0.374521: 1.70x slower
Avg: 0.222773 -> 0.376785: 1.69x slower
Significant (t=-176.17)
Stddev: 0.00166 -> 0.00221: 1.3331x larger

### threaded_count ###
Min: 0.171352 -> 0.151892: 1.13x faster
Avg: 0.183180 -> 0.153634: 1.19x faster
Significant (t=8.12)
Stddev: 0.00801 -> 0.00140: 5.7241x smaller

### unpack_sequence ###
Min: 0.000075 -> 0.000061: 1.23x faster
Avg: 0.000101 -> 0.000065: 1.54x faster
Significant (t=206.90)
Stddev: 0.00001 -> 0.00000: 3.2374x smaller

The following not significant results are hidden, use -v to show them:
chameleon, fannkuch, fastunpickle, regex_compile, silent_logging

### django ###
Min: 0.868956 -> 0.894571: 1.03x slower
Avg: 0.873620 -> 0.905274: 1.04x slower
Significant (t=-6.97)
Stddev: 0.00313 -> 0.00966: 3.0912x larger

### genshi ###
Min: 0.269615 -> 0.286348: 1.06x slower
Avg: 0.272206 -> 0.290708: 1.07x slower
Significant (t=-12.29)
Stddev: 0.00253 -> 0.00526: 2.0793x larger

### html5lib ###
12.279808 -> 11.862586: 1.04x faster