[IronPython] speed

Wed Jul 26 01:36:05 CEST 2006

You're correct that most of our work in getting to IronPython 1.0 has been focused on completeness and correctness rather than performance. IronPython 1.0 is roughly as fast as IronPython 0.1 was - which is reasonably fast (see more at the end of this message). As anyone who's built a large system knows, not losing performance while achieving completeness and correctness is a challenge. When we talk about IronPython performance, we try to reference specific benchmarks. The standard line you'll see is, "IronPython is fast - up to 1.8x faster than CPython on the standard pystone benchmark." Performance will vary on different tasks.

Even though performance will vary, any time that IronPython is 21x slower than CPython that should be considered a bug in IronPython and you should file it as an issue on CodePlex. I ran your test script on my ThinkPad X60 laptop with a 1.83GHz Intel Core Duo processor and 1.5GB of RAM under Windows XP SP2 with the final RTM release of .NET 2.0. Running your test with the 1.0beta9 release of IronPython, I find that it is ~8x slower in IronPython than in CPython-2.4. This is much better than your result, but still is not acceptable performance for such a simple test case.

Over the past week, we looked into this more closely. There were two major performance issues revealed by your test case. One was that the way we are packaging our signed release builds caused worse performance than the standard internal builds we tested on. The second issue was we had some bad performance issues calling methods on builtin types. Both of these issues have been fixed in the soon to be released IronPython 1.0 RC 1 (which you can build from the current codeplex sources today). After the fix, I find that IronPython-1.0rc1 is about 2.2x slower than CPython-2.4 on your benchmark code. While I wish that IronPython was faster on this test, for this stage of the project a ~2x performance hit on some benchmarks is considered acceptable. There are other benchmarks where IronPython will be 2x faster than CPython. In fact, I can modify your test below to write it in a more abstract style and it will run with roughly the same performance on IronPython as CPython.

import time

def do_x(i):
    if i % 2:
        return 10
    else:
        return "a string"

def do_z(i):
    if isinstance(i, str):
        return i.upper()
    else:
        return i*3

def test():
    start= time.clock()

    x = [do_x(i) for i in xrange(1000000)]
    z = [do_z(i) for i in x]

    end= time.clock() - start
    print end

test() # pre-run to ignore initialization time
test()

I can't stress enough how much we appreciate this kind of performance bug report. Because you included a small self-contained test script without any external dependencies, it was easy for us to isolate the issues in IronPython and get them fixed. Right now, we don't have the time to help people who are encountering performance issues in complete apps, but we can address issues when they are reported this clearly and are this easy to reproduce. The only additional thing that I would have liked to see here would be a more complete description of the machine and version of .NET and IronPython that you were running against.

I mentioned at the start of this email that IronPython's performance hasn't changed much from the 0.1 version. Keep in mind that IronPython 0.1 was a tiny little translator that I wrote from Python to C# that had everything it needed to run the pystone benchmark and nothing else. I'm quite excited that we've been able to keep the good performance aspects of that initial prototype in the 1.0 release. Here's a copy of data from the original email that I sent about IronPython 0.1:

--------------------------------------------------------------------
Date: Mon, 8 Dec 2003 17:16:15 -0800
From: Jim Hugunin <lists at HUGUNIN.NET>
Subject: Python can run fast on the CLR
To: DOTNET-LANGUAGE-DEVS at DISCUSS.MICROSOFT.COM

            IronPython-0.1  Python-2.3      Python-2.1
pystone         0.58            1.00            1.29

function call   0.19            1.00            1.12
integer add     0.59            1.00            1.18
string.replace  0.92            1.00            1.00
range(bigint)   5.57            1.00            1.09
eval("2+2")     66.97           1.00            1.58
-------------------------------------------------------------------

These numbers are measuring time to run each of the benchmarks and are all relative to Python-2.3. Smaller number are better and indicate faster performance.  For IronPython 0.1, the performance on function calls and integer add were both considerably faster than CPython, string.replace was roughly the same speed, range was too slow and performance on eval("2+2") was horrible.

Out of curiosity, I reran these same benchmarks on 1.0rc1 as well as CPython-2.4 and 2.5beta2 using the same machine as described above.

          IronPython-1.0rc1  Python-2.5b2      Python-2.4
pystone         0.55            1.00            1.01

function call   0.14            1.00            0.98
integer add     0.46            1.00            1.04
string.replace  0.92            1.00            1.45
range(bigint)   6.07            1.00            1.00
eval("2+2")    14.03            1.00            0.76

If you compare the numbers closely, you should be somewhat stunned by how similar they are. I certainly was. A lot of things have changed, but the underlying results still show that IronPython is blindingly fast on function calls, ~2x faster than CPython on pystone and simple math, about the same speed on many library function calls and ~6x slower for range(bigint). The good news is that the one place where IronPython was previously 67x slower than CPython is the one place where a huge improvement can be measured and IronPython is now only 14x slower on eval("2+2").

Everywhere that IronPython is more than 2x slower than CPython is something we need to better understand. For this set of microbenchmarks, there are two cases of this. The first case is range(bigint). I believe that this 6x perf hit on range(bigint) has the same underlying cause as the 2x perf hit on Luis's benchmark below. There is something to do with building up large lists of numbers that gives IronPython trouble. This is a performance issue that we will certainly be looking into more deeply in the future and I hope to see major improvements here in the future

The second performance issue is with eval("2+2"). The improvements to performance here came as a result of a new API (DynamicMethod) added in .NET 2.0 that let's us generate code for small methods much more efficiently. We'd like to see this get still faster, but it isn't an area of top concern to me since eval is very rarely used in performance critical scenarios. In fact, I've heard privately from a number of people that they wish CPython's eval could be slowed down by 10x to discourage it's use except where it is truly needed. IronPython does a lot more work compiling code in order to get its performance boosts and some additional cost to eval seems reasonable to pay here.

Looking at comparisons of microbenchmarks is an interesting way to identify possible low-hanging optimization fruit for IronPython. Last week I also looked at the pybench benchmark by Marc-Andre Lemburg which includes a large number of microbenchmark tests. This was helpful to more clearly identify the same performance issue Luis's test shows for builtin method calls. From pybench, we also noticed a surprising >10x performance hit in IronPython for list and tuple slicing. This has been reduced dramatically in the 1.0rc1 release. I expect that we will look at more of these kinds of comparisons in the future as we work to further optimize IronPython.

Thanks - Jim
--- bench.py used to generate the tables above (copy pystone.py from Lib/test) ---
import time
N = 1000000

import pystone

def test_pystone(L):
    pystone.pystones(200000)

def test_call(L):
    def f(a,b): return
    for i in L:
        f(1,2); f(1,2); f(1,2); f(1,2); f(1,2)
        f(1,2); f(1,2); f(1,2); f(1,2); f(1,2)

def test_add(L):
    x = 10
    for i in L:
        y = x + 1; y = x + 1; y = x + 1; y = x + 1; y = x + 1
        y = x + 1; y = x + 1; y = x + 1; y = x + 1; y = x + 1

def test_replace(L):
    s = "abcdefghi"
    for i in L:
        s.replace('def', 'DEF')
        s.replace('def', 'DEF')
        s.replace('def', 'DEF')
        s.replace('def', 'DEF')
        s.replace('def', 'DEF')
        s.replace('def', 'DEF')
        s.replace('def', 'DEF')
        s.replace('def', 'DEF')
        s.replace('def', 'DEF')
        s.replace('def', 'DEF')

def test_range(L):
    for i in range(100):
        r = range(N)

def test_eval(L):
    for i in range(10000):
        x = eval("2+2")

def bench(func, L):
    start = time.clock()
    func(L)
    end = time.clock()

    print 'ran %s in \t%.2f seconds' % (func.__name__, end-start)

tests = [test_pystone, test_call, test_add, test_replace, test_range, test_eval]
L = range(N)
for i in range(2):
    for test in tests:
        bench(test, L)
------------------------------------------------------------------------------------

From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of Luis M. Gonzalez
Sent: Tuesday, July 11, 2006 8:10 PM
To: users at lists.ironpython.com
Subject: [IronPython] speed

Hi everyone,
I'd like to ask you a question about Ironpython's speed and performance:
I imagine that so far, you've been concentrated in completeness and compatibility more than performance, and I guess you'll address this issue after verion 1.0.
However, and although you claimed that Ironpython is faster than cpython, I see cases where it is slower by a large margin.
For example, the script below is up to 21x slower than cpython.

My question is: what are your expectations regarding ironpython's speed in the future?
According to your experience so far, are you confident that it will match or surpass cpython's?
Where do you think it will be better and where it will be worse?

script
-----------------------------
import time

def test():
    start= time.clock()

    z=[]

    x=range(1000000)

    for i in range(1000000):
        if i % 2:
            x[i] = 10
        else:
            x[i] = "a string"

    for i in x:
        if type(i)==str:
                z.append(i.upper())
        else:
                z.append(i*3)

    end= time.clock() - start
    print end

test()
-----------------------------------
end script
Regards,
Luis