elementtidy, \0 chars and parsing from a string

Steven Bethard steven.bethard at gmail.com
Tue May 9 16:51:42 EDT 2006


So I see that elementtidy doesn't like strings with \0 characters in them:

 >>> import urllib
 >>> from elementtidy import TidyHTMLTreeBuilder
 >>> url = 'http://news.bbc.co.uk/1/hi/world/europe/492215.stm'
 >>> url_file = urllib.urlopen(url)
 >>> tree = TidyHTMLTreeBuilder.parse(url_file)
Traceback (most recent call last):
   ...
   File "...elementtidy\TidyHTMLTreeBuilder.py", line 90, in close
     stdout, stderr = _elementtidy.fixup(*args)
TypeError: fixup() argument 1 must be string without null bytes, not str

The obvious solution would be to str.replace('\0', '') on the file's 
text, but I'm not sure how to ask elementtidy to parse from a string 
instead of a file-like object.  Do I need to wrap it in a StringIO, or 
is there a better way?

STeVe



More information about the Python-list mailing list