HTML Parser which allows low-keyed local changes?
nobody at nowhere.com
Tue Feb 2 04:09:48 CET 2010
On Sun, 31 Jan 2010 20:57:31 +0100, Robert wrote:
> I tried lxml, but after walking and making changes in the element
> tree, I'm forced to do a full serialization of the whole document
> (etree.tostring(tree)) - which destroys the "human edited" format
> of the original HTML code.
> makes it rather unreadable.
> is there an existing HTML parser which supports tracking/writing
> back particular changes in a cautious way by just making local
> changes? or a least tracks the tag start/end positions in the file?
HTMLParser, sgmllib.SGMLParser and htmllib.HTMLParser all allow you to
retrieve the literal text of a start tag (but not an end tag).
Unfortunately, they're only tokenisers, not parsers, so you'll need to
handle minimisation yourself.
More information about the Python-list