elementtidy, \0 chars and parsing from a string
Steven Bethard
steven.bethard at gmail.com
Tue May 9 16:51:42 EDT 2006
So I see that elementtidy doesn't like strings with \0 characters in them:
>>> import urllib
>>> from elementtidy import TidyHTMLTreeBuilder
>>> url = 'http://news.bbc.co.uk/1/hi/world/europe/492215.stm'
>>> url_file = urllib.urlopen(url)
>>> tree = TidyHTMLTreeBuilder.parse(url_file)
Traceback (most recent call last):
...
File "...elementtidy\TidyHTMLTreeBuilder.py", line 90, in close
stdout, stderr = _elementtidy.fixup(*args)
TypeError: fixup() argument 1 must be string without null bytes, not str
The obvious solution would be to str.replace('\0', '') on the file's
text, but I'm not sure how to ask elementtidy to parse from a string
instead of a file-like object. Do I need to wrap it in a StringIO, or
is there a better way?
STeVe
More information about the Python-list
mailing list