An issue on CodePlex - <a href="http://www.codeplex.com/WorkItem/View.aspx?ProjectName=IronPython&amp;WorkItemId=651" title="http://www.codeplex.com/WorkItem/View.aspx?ProjectName=IronPython&amp;WorkItemId=651" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

http://www.codeplex.com/WorkItem/View.aspx?ProjectName=IronPython&amp;WorkItemId=651</a> - was raised on my behalf to do with performance using BeautifulSoup (a forgiving HTML parser).

<br><br>Here's a simple test which does the parsing and the &quot;prettifying&quot; - the process where BeautifulSoup rewrites the HTML in an attempt to make it well-formed.<br><br>The benchmark processes 2 different urls, loads them into BeautifulSoup and then reads out the &quot;pretty&quot; (or better-formed) html. I can't use urllib because it's apparently not been implemented (see 

<span style="font-size: 11pt; color: rgb(31, 73, 125);"><a href="http://www.codeplex.com/WorkItem/View.aspx?ProjectName=IronPython&amp;WorkItemId=1368" title="http://www.codeplex.com/WorkItem/View.aspx?ProjectName=IronPython&amp;WorkItemId=1368" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

http://www.codeplex.com/WorkItem/View.aspx?ProjectName=IronPython&amp;WorkItemId=1368</a>) so what I've done is two scripts. makeFiles.py script, which should be run in CPython, reads the urls and writes them to files.

<br>test.py is the actual benchmark, and the other one is the actual benchmark. The code for both is at the end of this message<br><br>These are the results I'm getting:<br><br>CPython 2.4<br>------------------<br>ran test_getHtml in&nbsp;&nbsp;&nbsp;&nbsp; 

0.00 seconds<br>ran test_load in&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.28 seconds<br>ran test_prettify in&nbsp;&nbsp;&nbsp; 0.05 seconds<br>ran benchmark in&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.33 seconds<br><br>IronPython 1.0 RC1<br>----------------------------<br>ran test_getHtml in&nbsp;&nbsp;&nbsp;&nbsp; 0.04

seconds ran test_load in&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.49 seconds ran test_prettify in&nbsp;&nbsp;&nbsp; 0.24 seconds ran benchmark in&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.77 seconds So you can see that IronPython is significantly slower than CPython on BeautifulSoup parsing.

<br>#---- makeFiles.py<br>import urllib<br><br>def test_getHtml(url):<br>&nbsp;&nbsp;&nbsp; f = urllib.urlopen(url)<br>&nbsp;&nbsp;&nbsp; html = f.read()<br>&nbsp;&nbsp;&nbsp; f.close()<br>&nbsp;&nbsp;&nbsp; <br>&nbsp;&nbsp;&nbsp; return html<br><br>def saveFile(fName, data):<br>&nbsp;&nbsp;&nbsp; f = open(fName, &quot;w&quot;)

<br>&nbsp;&nbsp;&nbsp; f.write(data)<br>&nbsp;&nbsp;&nbsp; f.close()<br>&nbsp;&nbsp;&nbsp; return<br><br>urls = [&quot;<a href="http://news.bbc.co.uk/2/hi/middle_east/5213602.stm">http://news.bbc.co.uk/2/hi/middle_east/5213602.stm</a>&quot;, &quot;<a href="http://www.cnn.com/2006/US/07/25/highway.shootings.ap/index.html">

http://www.cnn.com/2006/US/07/25/highway.shootings.ap/index.html</a>&quot;]<br>files = [&quot;c:\\bbc.html&quot;, &quot;c:\\cnn.html&quot;]<br><br>i = 0<br>for url in urls:<br>&nbsp;&nbsp;&nbsp; fName = files[i]<br>&nbsp;&nbsp;&nbsp; i += 1<br>&nbsp;&nbsp;&nbsp; data = test_getHtml(url)

<br>&nbsp;&nbsp;&nbsp; saveFile(fName, data)<br>&nbsp;&nbsp;&nbsp; <br><br><br>#test.py<br>#-----------------------------------------------------------------------------------------------------------------<br>#| Code Start<br>#-----------------------------------------------------------------------------------------------------------------

<br>import sys<br>sys.path.append(&quot;C:\\Python24\\Lib&quot;)<br><br>from BeautifulSoup import BeautifulSoup<br>import time<br><br>def test_getFile(fileName):<br>&nbsp;&nbsp;&nbsp; f = open(fileName, &quot;r&quot;)<br>&nbsp;&nbsp;&nbsp; html = f.read

()<br>&nbsp;&nbsp;&nbsp; f.close()<br>&nbsp;&nbsp;&nbsp; <br>&nbsp;&nbsp;&nbsp; return html<br><br>def test_load(html):<br>&nbsp;&nbsp;&nbsp; s = BeautifulSoup(html)<br>&nbsp;&nbsp;&nbsp; return s<br><br>def test_prettify(s):<br>&nbsp;&nbsp;&nbsp; t = s.prettify()<br>&nbsp;&nbsp;&nbsp; return t<br><br>files = [&quot;c:\\bbc.html&quot;, &quot;c:\\cnn.html&quot;]

<br>testCount = 2<br><br>benchmarkStart = time.clock()<br>time_getHtml = 0<br>time_load = 0<br>time_prettify = 0<br><br>for i in range(testCount):<br><br>&nbsp;&nbsp;&nbsp; for file in files:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fName = files[i]<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; testStart = 

time.clock()<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; html = test_getFile(fName)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; testEnd = time.clock()<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; time_getHtml += testEnd - testStart<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; testStart = time.clock()<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; s = test_load(html)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; testEnd = 

time.clock()<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; time_load += testEnd - testStart<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; testStart = time.clock()<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; t = test_prettify(s)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; testEnd = time.clock()<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; time_prettify += testEnd - testStart<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

<br>benchmarkEnd = time.clock()<br><br>print 'ran test_getHtml in \t%.2f seconds' % (time_getHtml)<br>print 'ran test_load in \t%.2f seconds' % (time_load)<br>print 'ran test_prettify in \t%.2f seconds' % (time_prettify)<br>

<br>print 'ran benchmark in \t%.2f seconds' % (benchmarkEnd - benchmarkStart)<br>#-----------------------------------------------------------------------------------------------------------------<br>#| Code End<br>#-----------------------------------------------------------------------------------------------------------------

<br><br>