[Python-Dev] sgmllib Comments

Terry Reedy tjreedy at udel.edu
Mon Jun 12 04:06:16 CEST 2006

"Fred L. Drake, Jr." <fdrake at acm.org> wrote in message 
news:200606112039.37834.fdrake at acm.org...
> On Sunday 11 June 2006 16:26, Sam Ruby wrote:
> > Planet is a feed aggregator written in Python.  It depends heavily on
> > SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
> > and I've submitted a test case and a patch[1] (use or discard the 
> > patch,
> > it is the test that I care about).
> > and which are original.  (Note: feeds often contain such abominations 
> > as
> > &amp;copy; which the new code will treat indistinguishably from &copy;)

> It really sounds like sgmllib is the wrong foundation for this.
> Have you looked at HTMLParser as an alternate to sgmllib?
> It has better support for XHTML constructs.

Have you (the OP), checked how related Python projects, such as Mark 
Pilgrim's feed parser,
handle the same sort of input (I have only looked at docs and tests, not 


More information about the Python-Dev mailing list