[Python-Dev] sgmllib Comments

Terry Reedy tjreedy at udel.edu
Mon Jun 12 04:06:16 CEST 2006


"Fred L. Drake, Jr." <fdrake at acm.org> wrote in message 
news:200606112039.37834.fdrake at acm.org...
> On Sunday 11 June 2006 16:26, Sam Ruby wrote:
> > Planet is a feed aggregator written in Python.  It depends heavily on
> > SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
> > and I've submitted a test case and a patch[1] (use or discard the 
> > patch,
> > it is the test that I care about).
...
> > and which are original.  (Note: feeds often contain such abominations 
> > as
> > &amp;copy; which the new code will treat indistinguishably from &copy;)

> It really sounds like sgmllib is the wrong foundation for this.
...
> Have you looked at HTMLParser as an alternate to sgmllib?
> It has better support for XHTML constructs.

Have you (the OP), checked how related Python projects, such as Mark 
Pilgrim's feed parser,
http://www.feedparser.org/
handle the same sort of input (I have only looked at docs and tests, not 
code).

tjr





More information about the Python-Dev mailing list