HTML Parser which allows low-keyed local changes?

Nobody nobody at nowhere.com
Mon Feb 1 22:09:48 EST 2010


On Sun, 31 Jan 2010 20:57:31 +0100, Robert wrote:

> I tried lxml, but after walking and making changes in the element 
> tree, I'm forced to do a full serialization of the whole document 
> (etree.tostring(tree)) - which destroys the "human edited" format 
> of the original HTML code.
> makes it rather unreadable.
> 
> is there an existing HTML parser which supports tracking/writing 
> back particular changes in a cautious way by just making local 
> changes? or a least tracks the tag start/end positions in the file?

HTMLParser, sgmllib.SGMLParser and htmllib.HTMLParser all allow you to
retrieve the literal text of a start tag (but not an end tag).
Unfortunately, they're only tokenisers, not parsers, so you'll need to
handle minimisation yourself.




More information about the Python-list mailing list