[IronPython] Slow Performance of CPython libs?

Dino Viehland dinov at exchange.microsoft.com
Thu Feb 21 19:48:41 CET 2008


After Birsh's last comment that they're calling compile I was thinking we might be doing something really stupid.  Turns out that we are.  For our supposedly compiled regex's we're not passing the Compiled option.  So they're all getting interpretted and of course that's going to suck big time.

So the fix for this is probably to just pass RegexOptions.Compiled to the Regex object being contructed into RE_Pattern.  Internally the Regex class will compile to dynamic methods (much better than compiling to an assembly because it'll be collectible).

It's actually a little more complex than that because we create RE_Pattern objects for the non-compiled so we need to flow the option in, but it's still a simple fix.

-----Original Message-----
From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of Dan Shechter
Sent: Thursday, February 21, 2008 10:19 AM
To: Discussion of IronPython
Subject: Re: [IronPython] Slow Performance of CPython libs?

Just out of curiosity...
I'm guessing that there are two levels of caching that could be used.
One would be to hold a global dictionary of some sort of pattern ->
Regex that will save the lengthy parsing of the re.

The other would be to actually call Regex.CompileToAssembly to get a
more efficient reperesentation (in runtime that is) of the re.

Would you (the IP team) favor the former or the latter?

        Shechter.


Birsch wrote:
> I checked both sgmllib.py and BeautifulSoup.py - and it seems both are
> reusing the same regexps (searched for re.compile).
>
> I think your suggestion is very relevant in this case. It makes sense to
> replicate the "compile once use many" behavior that is commonly used
> with regexp.
>
> -Birsch
>
> On Thu, Feb 21, 2008 at 7:30 PM, Dino Viehland
> <dinov at exchange.microsoft.com <mailto:dinov at exchange.microsoft.com>> wrote:
>
>     Do you know if the same reg ex is being used repeatedly?  If so
>     maybe we can cache & compile the regex instead of interpretting it
>     all the time.
>
>
>
>     *From:* users-bounces at lists.ironpython.com
>     <mailto:users-bounces at lists.ironpython.com>
>     [mailto:users-bounces at lists.ironpython.com
>     <mailto:users-bounces at lists.ironpython.com>] *On Behalf Of *Birsch
>     *Sent:* Thursday, February 21, 2008 8:30 AM
>
>     *To:* Discussion of IronPython
>     *Subject:* Re: [IronPython] Slow Performance of CPython libs?
>
>
>
>     I took on Cooper's advice and profiled with ANTS. Here are the top
>     methods:
>
>     *Namespace*
>
>
>
>     *Method name*
>
>
>
>     *Time (sec.)*
>
>
>
>     *Time with children (sec.)*
>
>
>
>     *Hit count*
>
>
>
>     *Source file*
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Go()
>
>
>
>     37.0189
>
>
>
>     94.4676
>
>
>
>     13689612
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Operator()
>
>
>
>     6.2411
>
>
>
>     6.2411
>
>
>
>     131146274
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Advance(int i)
>
>
>
>     5.9264
>
>
>
>     8.7202
>
>
>
>     66000263
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.SetOperator(int op)
>
>
>
>     5.5750
>
>
>
>     5.5750
>
>
>
>     131146274
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Backtrack()
>
>
>
>     5.5692
>
>
>
>     9.4895
>
>
>
>     37781343
>
>
>
>     IronPython.Runtime.Operations
>
>
>
>     Ops.CallWithContext(ICallerContext context, object func, object
>     arg0, object arg1)
>
>
>
>     5.5572
>
>
>
>     114.5245
>
>
>
>     79754
>
>
>
>     IronPython.Runtime.Calls
>
>
>
>     Method.Call(ICallerContext context, object arg0)
>
>
>
>     4.9052
>
>
>
>     114.8251
>
>
>
>     50886
>
>
>
>     IronPython.Runtime.Calls
>
>
>
>     PythonFunction.CallInstance(ICallerContext context, object arg0,
>     object arg1)
>
>
>
>     4.8876
>
>
>
>     114.8059
>
>
>
>     50886
>
>
>
>     IronPython.Runtime.Calls
>
>
>
>     Function2.Call(ICallerContext context, object arg0, object arg1)
>
>
>
>     4.6400
>
>
>
>     114.5471
>
>
>
>     47486
>
>
>
>     IronPython.Runtime.Operations
>
>
>
>     Ops.CallWithContext(ICallerContext context, object func, object arg0)
>
>
>
>     4.2344
>
>
>
>     114.1604
>
>
>
>     146658
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexBoyerMoore.Scan(string text, int index, int beglimit, int endlimit)
>
>
>
>     3.6465
>
>
>
>     3.6465
>
>
>
>     13678131
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexCharClass.CharInClassRecursive(char ch, string set, int start)
>
>
>
>     3.6288
>
>
>
>     5.7113
>
>
>
>     31508162
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Goto(int newpos)
>
>
>
>     3.2058
>
>
>
>     5.1470
>
>
>
>     27364668
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Operand(int i)
>
>
>
>     3.1923
>
>
>
>     3.1923
>
>
>
>     73230687
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexRunner.EnsureStorage()
>
>
>
>     3.0803
>
>
>
>     3.0803
>
>
>
>     51474823
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexCharClass.CharInClass(char ch, string set)
>
>
>
>     3.0713
>
>
>
>     8.7827
>
>
>
>     31508162
>
>
>
>     IronPython.Runtime.Calls
>
>
>
>     Method.Call(ICallerContext context, object arg0, object arg1)
>
>
>
>     2.9821
>
>
>
>     7.8675
>
>
>
>     15012
>
>
>
>     IronPython.Runtime.Calls
>
>
>
>     PythonFunction.CallInstance(ICallerContext context, object arg0,
>     object arg1, object arg2)
>
>
>
>     2.9794
>
>
>
>     7.8639
>
>
>
>     15012
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Forwardcharnext()
>
>
>
>     2.8852
>
>
>
>     2.8852
>
>
>
>     62865185
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Forwardchars()
>
>
>
>     2.8279
>
>
>
>     2.8279
>
>
>
>     59436277
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexCharClass.CharInClassInternal(char ch, string set, int start,
>     int mySetLength, int myCategoryLength)
>
>
>
>     2.0632
>
>
>
>     2.0826
>
>
>
>     31508162
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexRunner.Scan(Regex regex, string text, int textbeg, int textend,
>     int textstart, int prevlen, bool quick)
>
>
>
>     1.8376
>
>
>
>     101.7226
>
>
>
>     43009
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.FindFirstChar()
>
>
>
>     1.6405
>
>
>
>     5.3456
>
>
>
>     13701755
>
>
>
>     IronPython.Runtime.Types
>
>
>
>     OldClass.TryLookupSlot(SymbolId name, out object ret)
>
>
>
>     1.5573
>
>
>
>     2.8124
>
>
>
>     389516
>
>
>
>     IronPython.Runtime.Operations
>
>
>
>     Ops.GetAttr(ICallerContext context, object o, SymbolId name)
>
>
>
>     1.5365
>
>
>
>     5.3456
>
>
>
>     558524
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Textpos()
>
>
>
>     1.4020
>
>
>
>     1.4020
>
>
>
>     32648926
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Advance()
>
>
>
>     1.1916
>
>
>
>     2.9526
>
>
>
>     13703950
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Textto(int newpos)
>
>
>
>     1.1218
>
>
>
>     1.1218
>
>
>
>     24120890
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.TrackPeek()
>
>
>
>     1.0579
>
>
>
>     1.0579
>
>
>
>     24120894
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     Regex.Run(bool quick, int prevlen, string input, int beginning, int
>     length, int startat)
>
>
>
>     0.7280
>
>
>
>     102.4644
>
>
>
>     43009
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.TrackPush(int I1)
>
>
>
>     0.6834
>
>
>
>     0.6834
>
>
>
>     13745149
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.StackPush(int I1)
>
>
>
>     0.6542
>
>
>
>     0.6542
>
>
>
>     13703955
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.TrackPop()
>
>
>
>     0.6068
>
>
>
>     0.6068
>
>
>
>     13663035
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.TrackPush()
>
>
>
>     0.6049
>
>
>
>     0.6049
>
>
>
>     13708230
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.StackPop()
>
>
>
>     0.5836
>
>
>
>     0.5836
>
>
>
>     13703956
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.Bump()
>
>
>
>     0.4987
>
>
>
>     0.4987
>
>
>
>     10472790
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.TrackPush(int I1, int I2)
>
>
>
>     0.4864
>
>
>
>     0.4864
>
>
>
>     10472790
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.TrackPeek(int i)
>
>
>
>     0.4663
>
>
>
>     0.4663
>
>
>
>     10457859
>
>
>
>     System.Text.RegularExpressions
>
>
>
>     RegexInterpreter.TrackPop(int framesize)
>
>
>
>     0.4396
>
>
>
>     0.4396
>
>
>
>     10457859
>
>
>
>
>     Moving up the stack of regex.Go(), most calls originate from
>     sgmllib's parse_starttag.
>
>     HTH,
>     -Birsch
>
>     On Thu, Feb 21, 2008 at 2:35 PM, Birsch <birsch at gmail.com
>     <mailto:birsch at gmail.com>> wrote:
>
>     Thanks Michael and Dino.
>
>     I'll prof and send update. Got a good profiler recommendation for .Net?
>     Meanwhile I noticed the sample site below causes BeautifulSoup to
>     generate quite a few [python] exceptions during __init__. Does
>     IronPython handle exceptions significantly slower than CPtyhon?
>
>     Repro code is simple (just build a BeautifulSoup obj with mininova's
>     home page).
>     Here are the .py and .cs I used to time the diffs:
>
>     *bstest.py:*
>     #Bypass CPython default socket implementation with IPCE/FePy
>     import imp, os, sys
>     sys.modules['socket'] = module = imp.new_module('socket')
>     execfile('socket.py', module.__dict__)
>
>     from BeautifulSoup import BeautifulSoup
>     from urllib import urlopen
>     import datetime
>
>     def getContent(url):
>         #Download html data
>         startTime = datetime.datetime.now()
>         print "Getting url", url
>         html = urlopen(url).read()
>         print "Time taken:", datetime.datetime.now() - startTime
>
>         #Make soup
>         startTime = datetime.datetime.now()
>         print "Making soup..."
>         soup = BeautifulSoup(markup=html)
>         print "Time taken:", datetime.datetime.now() - startTime
>
>     if __name__ == "__main__":
>         print getContent("www.mininova.org <http://www.mininova.org>")
>
>
>     *C#:*
>     using System;
>     using System.Collections.Generic;
>     using System.Text;
>     using IronPython.Hosting;
>
>     namespace IronPythonBeautifulSoupTest
>     {
>         public class Program
>         {
>             public static void Main(string[] args)
>             {
>                 //Init
>                 System.Console.WriteLine("Starting...");
>                 DateTime start = DateTime.Now;
>                 PythonEngine engine = new PythonEngine();
>
>                 //Add paths:
>                 //BeautifulSoup.py, socket.py, bstest.py located on exe dir
>                 engine.AddToPath(@".");
>                 //CPython Lib (replace with your own)
>                 engine.AddToPath(@"D:\Dev\Python\Lib");
>
>                 //Import and load
>                 TimeSpan span = DateTime.Now - start;
>                 System.Console.WriteLine("[1] Import: " +
>     span.TotalSeconds);
>                 DateTime d = DateTime.Now;
>                 engine.ExecuteFile(@"bstest.py");
>                 span = DateTime.Now - d;
>                 System.Console.WriteLine("[2] Load: " + span.TotalSeconds);
>
>                 //Execute
>                 d = DateTime.Now;
>                 engine.Execute("getContent(\"http://www.mininova.org\")");
>                 span = DateTime.Now - d;
>                 System.Console.WriteLine("[3] Execute: " +
>     span.TotalSeconds);
>                 span = DateTime.Now - start;
>                 System.Console.WriteLine("Total: " + span.TotalSeconds);
>
>
>             }
>         }
>     }
>
>
>     On Wed, Feb 20, 2008 at 6:57 PM, Dino Viehland
>     <dinov at exchange.microsoft.com <mailto:dinov at exchange.microsoft.com>>
>     wrote:
>
>     We've actually had this issue reported once before a long time ago -
>     it's a very low CodePlex ID -
>     http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=651
>
>     We haven't had a chance to investigate the end-to-end scenario.  If
>     someone could come up with a smaller simpler repro that'd be great.
>      Otherwise we haven't forgotten about it we've just had more
>     immediately pressing issues to work on :(.
>
>
>     -----Original Message-----
>     From: users-bounces at lists.ironpython.com
>     <mailto:users-bounces at lists.ironpython.com>
>     [mailto:users-bounces at lists.ironpython.com
>     <mailto:users-bounces at lists.ironpython.com>] On Behalf Of Michael Foord
>     Sent: Wednesday, February 20, 2008 5:20 AM
>     To: Discussion of IronPython
>     Subject: Re: [IronPython] Slow Performance of CPython libs?
>
>     Birsch wrote:
>      > Hi - We've been using IronPython successfully to allow extensibility
>      > of our application.
>      >
>      > Overall we are happy with the performance, with the exception of
>      > BeautifulSoup which seems to run very slowly: x5 or more time to
>      > execute compared to CPython.
>      >
>      > Most of the time seems to be spent during __init__() of BS, where the
>      > markup is parsed.
>      >
>      > We suspect this has to do with the fact that our CPython env is
>      > executing .pyc files and can precompile its libs, while the
>     IronPython
>      > environment compiles each iteration. We couldn't find a way to
>      > pre-compile the libs and then introduce them into the code, but
>     in any
>      > case this will result in a large management overhead since the amount
>      > of CPython libs we expose to our users contains 100's of modules.
>      >
>      > Any ideas on how to optimize?
>
>     I think it is worth doing real profiling to find out where the time is
>     being spent during parsing.
>
>     If it is spending most of the time in '__init__' then the time is
>     probably not spent in importing - so compilation isn't relevant and it
>     is a runtime performance issue. (Importing is much slower with
>     IronPython and at Resolver Systems we do use precompiled binaries - but
>     strangely enough it doesn't provide much of a performance gain.)
>
>     Michael
>     http://www.manning.com/foord
>
>      >
>      > Thanks,
>      > -Birsch
>      >
>      > Note: we're using FePy/IPCE libs with regular IP v1.1.1 runtime DLLs
>      > (this was done to overcome library incompatibilities and network
>      > errors). However, the relevant slow .py code (mainly SGMLParser and
>      > BeautifulSoup) is the same.
>      >
>     ------------------------------------------------------------------------
>      >
>      > _______________________________________________
>      > Users mailing list
>      > Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
>      > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>      >
>
>     _______________________________________________
>     Users mailing list
>     Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
>     http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>     _______________________________________________
>     Users mailing list
>     Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
>     http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
>
>
>
>
>
>     _______________________________________________
>     Users mailing list
>     Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
>     http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
_______________________________________________
Users mailing list
Users at lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com



More information about the Ironpython-users mailing list