[Python-Dev] sgmllib Comments
Terry Reedy
tjreedy at udel.edu
Mon Jun 12 04:06:16 CEST 2006
"Fred L. Drake, Jr." <fdrake at acm.org> wrote in message
news:200606112039.37834.fdrake at acm.org...
> On Sunday 11 June 2006 16:26, Sam Ruby wrote:
> > Planet is a feed aggregator written in Python. It depends heavily on
> > SGMLLib. A recent bug report turned out to be a deficiency in sgmllib,
> > and I've submitted a test case and a patch[1] (use or discard the
> > patch,
> > it is the test that I care about).
...
> > and which are original. (Note: feeds often contain such abominations
> > as
> > &copy; which the new code will treat indistinguishably from ©)
> It really sounds like sgmllib is the wrong foundation for this.
...
> Have you looked at HTMLParser as an alternate to sgmllib?
> It has better support for XHTML constructs.
Have you (the OP), checked how related Python projects, such as Mark
Pilgrim's feed parser,
http://www.feedparser.org/
handle the same sort of input (I have only looked at docs and tests, not
code).
tjr
More information about the Python-Dev
mailing list