[IronPython] Slow Performance of CPython libs?
Dino Viehland
dinov at exchange.microsoft.com
Thu Feb 21 18:30:50 CET 2008
Do you know if the same reg ex is being used repeatedly? If so maybe we can cache & compile the regex instead of interpretting it all the time.
From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of Birsch
Sent: Thursday, February 21, 2008 8:30 AM
To: Discussion of IronPython
Subject: Re: [IronPython] Slow Performance of CPython libs?
I took on Cooper's advice and profiled with ANTS. Here are the top methods:
Namespace
Method name
Time (sec.)
Time with children (sec.)
Hit count
Source file
System.Text.RegularExpressions
RegexInterpreter.Go()
37.0189
94.4676
13689612
System.Text.RegularExpressions
RegexInterpreter.Operator()
6.2411
6.2411
131146274
System.Text.RegularExpressions
RegexInterpreter.Advance(int i)
5.9264
8.7202
66000263
System.Text.RegularExpressions
RegexInterpreter.SetOperator(int op)
5.5750
5.5750
131146274
System.Text.RegularExpressions
RegexInterpreter.Backtrack()
5.5692
9.4895
37781343
IronPython.Runtime.Operations
Ops.CallWithContext(ICallerContext context, object func, object arg0, object arg1)
5.5572
114.5245
79754
IronPython.Runtime.Calls
Method.Call(ICallerContext context, object arg0)
4.9052
114.8251
50886
IronPython.Runtime.Calls
PythonFunction.CallInstance(ICallerContext context, object arg0, object arg1)
4.8876
114.8059
50886
IronPython.Runtime.Calls
Function2.Call(ICallerContext context, object arg0, object arg1)
4.6400
114.5471
47486
IronPython.Runtime.Operations
Ops.CallWithContext(ICallerContext context, object func, object arg0)
4.2344
114.1604
146658
System.Text.RegularExpressions
RegexBoyerMoore.Scan(string text, int index, int beglimit, int endlimit)
3.6465
3.6465
13678131
System.Text.RegularExpressions
RegexCharClass.CharInClassRecursive(char ch, string set, int start)
3.6288
5.7113
31508162
System.Text.RegularExpressions
RegexInterpreter.Goto(int newpos)
3.2058
5.1470
27364668
System.Text.RegularExpressions
RegexInterpreter.Operand(int i)
3.1923
3.1923
73230687
System.Text.RegularExpressions
RegexRunner.EnsureStorage()
3.0803
3.0803
51474823
System.Text.RegularExpressions
RegexCharClass.CharInClass(char ch, string set)
3.0713
8.7827
31508162
IronPython.Runtime.Calls
Method.Call(ICallerContext context, object arg0, object arg1)
2.9821
7.8675
15012
IronPython.Runtime.Calls
PythonFunction.CallInstance(ICallerContext context, object arg0, object arg1, object arg2)
2.9794
7.8639
15012
System.Text.RegularExpressions
RegexInterpreter.Forwardcharnext()
2.8852
2.8852
62865185
System.Text.RegularExpressions
RegexInterpreter.Forwardchars()
2.8279
2.8279
59436277
System.Text.RegularExpressions
RegexCharClass.CharInClassInternal(char ch, string set, int start, int mySetLength, int myCategoryLength)
2.0632
2.0826
31508162
System.Text.RegularExpressions
RegexRunner.Scan(Regex regex, string text, int textbeg, int textend, int textstart, int prevlen, bool quick)
1.8376
101.7226
43009
System.Text.RegularExpressions
RegexInterpreter.FindFirstChar()
1.6405
5.3456
13701755
IronPython.Runtime.Types
OldClass.TryLookupSlot(SymbolId name, out object ret)
1.5573
2.8124
389516
IronPython.Runtime.Operations
Ops.GetAttr(ICallerContext context, object o, SymbolId name)
1.5365
5.3456
558524
System.Text.RegularExpressions
RegexInterpreter.Textpos()
1.4020
1.4020
32648926
System.Text.RegularExpressions
RegexInterpreter.Advance()
1.1916
2.9526
13703950
System.Text.RegularExpressions
RegexInterpreter.Textto(int newpos)
1.1218
1.1218
24120890
System.Text.RegularExpressions
RegexInterpreter.TrackPeek()
1.0579
1.0579
24120894
System.Text.RegularExpressions
Regex.Run(bool quick, int prevlen, string input, int beginning, int length, int startat)
0.7280
102.4644
43009
System.Text.RegularExpressions
RegexInterpreter.TrackPush(int I1)
0.6834
0.6834
13745149
System.Text.RegularExpressions
RegexInterpreter.StackPush(int I1)
0.6542
0.6542
13703955
System.Text.RegularExpressions
RegexInterpreter.TrackPop()
0.6068
0.6068
13663035
System.Text.RegularExpressions
RegexInterpreter.TrackPush()
0.6049
0.6049
13708230
System.Text.RegularExpressions
RegexInterpreter.StackPop()
0.5836
0.5836
13703956
System.Text.RegularExpressions
RegexInterpreter.Bump()
0.4987
0.4987
10472790
System.Text.RegularExpressions
RegexInterpreter.TrackPush(int I1, int I2)
0.4864
0.4864
10472790
System.Text.RegularExpressions
RegexInterpreter.TrackPeek(int i)
0.4663
0.4663
10457859
System.Text.RegularExpressions
RegexInterpreter.TrackPop(int framesize)
0.4396
0.4396
10457859
Moving up the stack of regex.Go(), most calls originate from sgmllib's parse_starttag.
HTH,
-Birsch
On Thu, Feb 21, 2008 at 2:35 PM, Birsch <birsch at gmail.com<mailto:birsch at gmail.com>> wrote:
Thanks Michael and Dino.
I'll prof and send update. Got a good profiler recommendation for .Net?
Meanwhile I noticed the sample site below causes BeautifulSoup to generate quite a few [python] exceptions during __init__. Does IronPython handle exceptions significantly slower than CPtyhon?
Repro code is simple (just build a BeautifulSoup obj with mininova's home page).
Here are the .py and .cs I used to time the diffs:
bstest.py:
#Bypass CPython default socket implementation with IPCE/FePy
import imp, os, sys
sys.modules['socket'] = module = imp.new_module('socket')
execfile('socket.py', module.__dict__)
from BeautifulSoup import BeautifulSoup
from urllib import urlopen
import datetime
def getContent(url):
#Download html data
startTime = datetime.datetime.now()
print "Getting url", url
html = urlopen(url).read()
print "Time taken:", datetime.datetime.now() - startTime
#Make soup
startTime = datetime.datetime.now()
print "Making soup..."
soup = BeautifulSoup(markup=html)
print "Time taken:", datetime.datetime.now() - startTime
if __name__ == "__main__":
print getContent("www.mininova.org<http://www.mininova.org>")
C#:
using System;
using System.Collections.Generic;
using System.Text;
using IronPython.Hosting;
namespace IronPythonBeautifulSoupTest
{
public class Program
{
public static void Main(string[] args)
{
//Init
System.Console.WriteLine("Starting...");
DateTime start = DateTime.Now;
PythonEngine engine = new PythonEngine();
//Add paths:
//BeautifulSoup.py, socket.py, bstest.py located on exe dir
engine.AddToPath(@".");
//CPython Lib (replace with your own)
engine.AddToPath(@"D:\Dev\Python\Lib");
//Import and load
TimeSpan span = DateTime.Now - start;
System.Console.WriteLine("[1] Import: " + span.TotalSeconds);
DateTime d = DateTime.Now;
engine.ExecuteFile(@"bstest.py");
span = DateTime.Now - d;
System.Console.WriteLine("[2] Load: " + span.TotalSeconds);
//Execute
d = DateTime.Now;
engine.Execute("getContent(\"http://www.mininova.org\")");
span = DateTime.Now - d;
System.Console.WriteLine("[3] Execute: " + span.TotalSeconds);
span = DateTime.Now - start;
System.Console.WriteLine("Total: " + span.TotalSeconds);
}
}
}
On Wed, Feb 20, 2008 at 6:57 PM, Dino Viehland <dinov at exchange.microsoft.com<mailto:dinov at exchange.microsoft.com>> wrote:
We've actually had this issue reported once before a long time ago - it's a very low CodePlex ID - http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=651
We haven't had a chance to investigate the end-to-end scenario. If someone could come up with a smaller simpler repro that'd be great. Otherwise we haven't forgotten about it we've just had more immediately pressing issues to work on :(.
-----Original Message-----
From: users-bounces at lists.ironpython.com<mailto:users-bounces at lists.ironpython.com> [mailto:users-bounces at lists.ironpython.com<mailto:users-bounces at lists.ironpython.com>] On Behalf Of Michael Foord
Sent: Wednesday, February 20, 2008 5:20 AM
To: Discussion of IronPython
Subject: Re: [IronPython] Slow Performance of CPython libs?
Birsch wrote:
> Hi - We've been using IronPython successfully to allow extensibility
> of our application.
>
> Overall we are happy with the performance, with the exception of
> BeautifulSoup which seems to run very slowly: x5 or more time to
> execute compared to CPython.
>
> Most of the time seems to be spent during __init__() of BS, where the
> markup is parsed.
>
> We suspect this has to do with the fact that our CPython env is
> executing .pyc files and can precompile its libs, while the IronPython
> environment compiles each iteration. We couldn't find a way to
> pre-compile the libs and then introduce them into the code, but in any
> case this will result in a large management overhead since the amount
> of CPython libs we expose to our users contains 100's of modules.
>
> Any ideas on how to optimize?
I think it is worth doing real profiling to find out where the time is
being spent during parsing.
If it is spending most of the time in '__init__' then the time is
probably not spent in importing - so compilation isn't relevant and it
is a runtime performance issue. (Importing is much slower with
IronPython and at Resolver Systems we do use precompiled binaries - but
strangely enough it doesn't provide much of a performance gain.)
Michael
http://www.manning.com/foord
>
> Thanks,
> -Birsch
>
> Note: we're using FePy/IPCE libs with regular IP v1.1.1 runtime DLLs
> (this was done to overcome library incompatibilities and network
> errors). However, the relevant slow .py code (mainly SGMLParser and
> BeautifulSoup) is the same.
> ------------------------------------------------------------------------
>
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com<mailto:Users at lists.ironpython.com>
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
_______________________________________________
Users mailing list
Users at lists.ironpython.com<mailto:Users at lists.ironpython.com>
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
_______________________________________________
Users mailing list
Users at lists.ironpython.com<mailto:Users at lists.ironpython.com>
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20080221/8927bb9f/attachment.html>
More information about the Ironpython-users
mailing list