commissar wu wrote:
Hi:everyone,lxml is very good, I like it .But I recently encountered a little trouble.I use lxml to parse the contents of the url( http://www.dtzww.cn/files/article/fulltext/23/23208.html),the lxml is been blocking,and don't rasie exception. The CPU utilization rate is 100%.
My environment is lxml-2.2.2. ubutnu-8.04-amd64-server python-2.5.2
My code is fellow:
import lxml.html as htmltool import urlib
url = "http://www.dtzww.cn/files/article/fulltext/23/23208.html" f = urllib.urlopen(url) data = f.read()
doc = htmltool.document_fromstring(data) ## <--- Block this
I can reproduce this, although I didn't look into it any deeper yet. This works for me, though: import lxml.html as htmltool url = "http://www.dtzww.cn/files/article/fulltext/23/23208.html" doc = htmltool.parse(url) Stefan