[Python-Dev] sgmllib Comments

Sam Ruby rubys at intertwingly.net
Mon Jun 12 06:05:06 CEST 2006


Terry Reedy wrote:
> "Fred L. Drake, Jr." <fdrake at acm.org> wrote in message 
> news:200606112039.37834.fdrake at acm.org...
>> On Sunday 11 June 2006 16:26, Sam Ruby wrote:
>>> Planet is a feed aggregator written in Python.  It depends heavily on
>>> SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
>>> and I've submitted a test case and a patch[1] (use or discard the 
>>> patch,
>>> it is the test that I care about).
> ...
>>> and which are original.  (Note: feeds often contain such abominations 
>>> as
>>> &amp;copy; which the new code will treat indistinguishably from &copy;)
> 
>> It really sounds like sgmllib is the wrong foundation for this.
> ...
>> Have you looked at HTMLParser as an alternate to sgmllib?
>> It has better support for XHTML constructs.
> 
> Have you (the OP), checked how related Python projects, such as Mark 
> Pilgrim's feed parser,
> http://www.feedparser.org/
> handle the same sort of input (I have only looked at docs and tests, not 
> code).

Just to be clear: Planet uses Mark's feed parser, which uses SGMLlib.

I'm a committer on that project:

http://sourceforge.net/project/memberlist.php?group_id=112328

I was investigating a bug in sgmllib which affected the feed parser (and 
therefore Planet), and noticed that there were changes in the SVN head 
of Python which broke three feed parser unit tests.

It is my belief that these changes will break other existing users of 
sgmllib.

- Sam Ruby


More information about the Python-Dev mailing list