[IronPython] speed
Mike Raath
raathm at gmail.com
Thu Jul 27 08:15:19 CEST 2006
An issue on CodePlex -
http://www.codeplex.com/WorkItem/View.aspx?ProjectName=IronPython&WorkItemId=651-
was raised on my behalf to do with performance using BeautifulSoup (a
forgiving HTML parser).
Here's a simple test which does the parsing and the "prettifying" - the
process where BeautifulSoup rewrites the HTML in an attempt to make it
well-formed.
The benchmark processes 2 different urls, loads them into BeautifulSoup and
then reads out the "pretty" (or better-formed) html. I can't use urllib
because it's apparently not been implemented (see
http://www.codeplex.com/WorkItem/View.aspx?ProjectName=IronPython&WorkItemId=1368)
so what I've done is two scripts.
makeFiles.py script, which should be run in CPython, reads the urls and
writes them to files.
test.py is the actual benchmark, and the other one is the actual benchmark.
The code for both is at the end of this message
These are the results I'm getting:
CPython 2.4
------------------
ran test_getHtml in 0.00 seconds
ran test_load in 0.28 seconds
ran test_prettify in 0.05 seconds
ran benchmark in 0.33 seconds
IronPython 1.0 RC1
----------------------------
ran test_getHtml in 0.04 seconds
ran test_load in 2.49 seconds
ran test_prettify in 0.24 seconds
ran benchmark in 2.77 seconds
So you can see that IronPython is significantly slower than CPython on
BeautifulSoup parsing.
#---- makeFiles.py
import urllib
def test_getHtml(url):
f = urllib.urlopen(url)
html = f.read()
f.close()
return html
def saveFile(fName, data):
f = open(fName, "w")
f.write(data)
f.close()
return
urls = ["http://news.bbc.co.uk/2/hi/middle_east/5213602.stm", "
http://www.cnn.com/2006/US/07/25/highway.shootings.ap/index.html"]
files = ["c:\\bbc.html", "c:\\cnn.html"]
i = 0
for url in urls:
fName = files[i]
i += 1
data = test_getHtml(url)
saveFile(fName, data)
#test.py
#-----------------------------------------------------------------------------------------------------------------
#| Code Start
#-----------------------------------------------------------------------------------------------------------------
import sys
sys.path.append("C:\\Python24\\Lib")
from BeautifulSoup import BeautifulSoup
import time
def test_getFile(fileName):
f = open(fileName, "r")
html = f.read()
f.close()
return html
def test_load(html):
s = BeautifulSoup(html)
return s
def test_prettify(s):
t = s.prettify()
return t
files = ["c:\\bbc.html", "c:\\cnn.html"]
testCount = 2
benchmarkStart = time.clock()
time_getHtml = 0
time_load = 0
time_prettify = 0
for i in range(testCount):
for file in files:
fName = files[i]
testStart = time.clock()
html = test_getFile(fName)
testEnd = time.clock()
time_getHtml += testEnd - testStart
testStart = time.clock()
s = test_load(html)
testEnd = time.clock()
time_load += testEnd - testStart
testStart = time.clock()
t = test_prettify(s)
testEnd = time.clock()
time_prettify += testEnd - testStart
benchmarkEnd = time.clock()
print 'ran test_getHtml in \t%.2f seconds' % (time_getHtml)
print 'ran test_load in \t%.2f seconds' % (time_load)
print 'ran test_prettify in \t%.2f seconds' % (time_prettify)
print 'ran benchmark in \t%.2f seconds' % (benchmarkEnd - benchmarkStart)
#-----------------------------------------------------------------------------------------------------------------
#| Code End
#-----------------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20060727/0612078b/attachment.html>
More information about the Ironpython-users
mailing list