Re: [lxml-dev] lxml bug

13 Aug 2009

      commissar wu wrote:
...
Hi:everyone,lxml is very good, I like it .But I recently encountered a
little trouble.I use lxml to parse the contents of the url(
http://www.dtzww.cn/files/article/fulltext/23/23208.html),the lxml is been
blocking,and don't rasie exception. The CPU utilization rate is 100%.
My environment is  lxml-2.2.2.  ubutnu-8.04-amd64-server python-2.5.2
My code is fellow:
import lxml.html as htmltool
import urlib
url = "http://www.dtzww.cn/files/article/fulltext/23/23208.html"
f = urllib.urlopen(url)
data = f.read()
doc = htmltool.document_fromstring(data)    ## <--- Block this
I can reproduce this, although I didn't look into it any deeper yet.

This works for me, though:

	import lxml.html as htmltool
	url = "http://www.dtzww.cn/files/article/fulltext/23/23208.html"
	doc = htmltool.parse(url)

Stefan

Re: [lxml-dev] lxml bug

Stefan Behnel