[ mailman-Patches-657951 ] Generate RSS summary in archives

SourceForge.net noreply at sourceforge.net
Mon Dec 6 19:01:43 CET 2004


Patches item #657951, was opened at 2002-12-23 19:17
Message generated for change (Comment added) made by ppsys
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=300103&aid=657951&group_id=103

Category: Pipermail
Group: Mailman 2.2 / 3.0
Status: Open
Resolution: None
Priority: 9
Submitted By: A.M. Kuchling (akuchling)
Assigned to: Nobody/Anonymous (nobody)
Summary: Generate RSS summary in archives

Initial Comment:
Here's a first-draft patch.  Things that need fixing:

* The generated RSS feed needs to be validated.  (It passed the 
W3C's RDF validator, but RSS validators still need to be checked.)

* The date should be given in YYYY-MM-DD format, which requires
parsing the .fromdate attribute.

* How do I get the URL for an archived message?  The generated RSS
currently just uses the filename, which is wrong.  How do I get
at the PUBLIC_ARCHIVE_URL setting?

* Getting the most recent N postings is inefficient; the code loops through all of the archived messages and takes the last N of them.
We could add .last() and .prev() methods to the Database class, but that's more ambitious for 2.1beta than I like.  (Would be nice to get this into 2.1final...)

* The list index page should have a LINK element pointing to
the RSS file.

Please make any comments you have, and I'll rework the patch accordingly.



----------------------------------------------------------------------

Comment By: Richard Barrett (ppsys)
Date: 2004-12-06 18:01

Message:
Logged In: YES 
user_id=75166

The following is based on the July 2003 version of the patch file posted 
on sourceforge.

The RSS patch adds the RSS() function as member function of the 
HyperArchive class defined in HyperArch.py.

It has been reported that the following statement in RSS():

    date, msgid = self.database.dateIndex.first()

 may generate an AttributeError exception:
 
    AttributeError: HyperDatabase instance has no attribute 'dateIndex'

The  RSS patch appears to make the assumption that whenever the 
RSS() function is called from the write_TOC() member function of the 
HyperArchive class the __openIndices() function has already been called 
on the latest period archive associated with the list, whose TOC page is 
being generated by write_TOC(), and that no intervening call to 
__closeIndices() has been made.

If the assumption were correct then whenever the RSS() function was 
called on a HyperArchive instance, the xxxxxIndices attributes of the 
HyperDatabase instance "owned" by the HyperArchive instance would be 
pointing to valid instance of DumbBTree.

Unfortunately, this assumption is not correct. In order to do its work, 
write_TOC() does not itself need to perform any call to the 
__openIndices() function for the list/archive/database whose TOC page is 
to be recreated. It just happens that in some circumstances, some of the 
code which might call write_TOC may have called the __openIndices() 
function at some prior point and left the HyperDatabase instance with a 
valid set of xxxxxIndices attributes in place when write_TOC() is called.

For the RSS patch to be work reliably the code in the RSS() function has 
to be changed so that it ensures that the conditions it wants prevail when 
it executes the statement giving the problem.

The following is an untested code change but if part of the RSS() 
function's code definition in HyperArch.py is modified from:

<quote>
        # Get the most recent messages.  The only index operation
        # we can count on is traversal by increasing date, so
        # we end up traversing all of the entries and remembering the last
        # N of them.  Sigh.
        items = []
        try:
            date, msgid = self.database.dateIndex.first()
            items.append(msgid)
        except KeyError:
            pass
	
        while 1:
            try:
</quote>

to read:

<quote>
        # Get the most recent messages.  The only index operation
        # we can count on is traversal by increasing date, so
        # we end up traversing all of the entries and remembering the last
        # N of them.  Sigh.
        items = []
        got_first = 0
        try:
            msgid = self.database.first(self.archives[0], 'date')
            if msgid:
                items.append(msgid)
                got_first = 1
        except KeyError:
            pass
	
        while got_first and 1:
            try:
</quote>

this should fix the exception problem. 



----------------------------------------------------------------------

Comment By: Roy M. Silvernail (codewhacker)
Date: 2003-12-23 19:35

Message:
Logged In: YES 
user_id=670974

I'm trying to enrich the RSS output by adding a proper
[description] and a [content:encoded] module, but I am
having the devil's own time locating the raw message text. 
Be happy to contribute a patch if you can point me to the
raw content (without the italics markup for quoting).

Thanks!

----------------------------------------------------------------------

Comment By: Michael Weber (wookiew)
Date: 2003-09-10 16:29

Message:
Logged In: YES 
user_id=863445

So far the patch is included (by the way: i hope that
Defaults.py.in in the patch *means* Defaults.py ) and
mailman get a restart. Hopefully i add the two lines in
listinfo.html ( /de/ because we have german speaking lists)
and take a look for the xml file).
After search the whole device (only to be sure) i can say:
There is no file like this. Is another patch need before?
Another setup to make? I cant find any hint here... so i
have to ask. But the idea is great... if it work on my lists
its genious...
regards, Michael
running version 2.1.1

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2003-09-02 04:15

Message:
Logged In: YES 
user_id=12800

Bumping priority.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2003-07-15 18:55

Message:
Logged In: YES 
user_id=11375

OK, done!

This patch is now ready to go in: some people have looked at
the RSS and haven't spotted any problems.  Barry, can I
please get CVS write access to check this in?


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2003-07-11 01:00

Message:
Logged In: YES 
user_id=11375

Attaching correct version of the patch.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2003-07-11 00:52

Message:
Logged In: YES 
user_id=11375

Here at last is an updated version of the patch that's crawling closer to being complete.  There's now a RSS_NUM_ARTICLES setting in Defaults.py, the generated URLs are correct, and I modified the English template to link to the RSS file.  

Remaining things: check the generated RSS for correctness; edit all of the other language templates to include the RSS file (I may ask for CVS write access to do that).  It would be really nice if the Mailman upgrade script could update existing general list information pages to include the LINK element; any suggestion about how to go about that?


----------------------------------------------------------------------

Comment By: Dan Brickley (danbri)
Date: 2003-06-22 15:00

Message:
Logged In: YES 
user_id=7830


OK, I've regenerated the patch with some code which works
for me.

http://rdfweb.org/2003/06/mailman-rss/rsspatch

Health warning: 

    * I suspect it may fail in conditions when
get_archives() returns 
      a list not a string (does this ever happen?).
    * See also problems mentioned below, regenerating partial 
       archives seems tricky.

Hope this is useful anyways... 

Dan &lt;danbri at w3.org&gt; 

----------------------------------------------------------------------

Comment By: Dan Brickley (danbri)
Date: 2003-06-22 11:48

Message:
Logged In: YES 
user_id=7830

I thought I'd have a look at this myself, though have modest
knowledge of both Python and MailMan.

In the course of trying to patch the patch, I tried running
the archiver over just the last couple of messages, to speed
things along: 
&quot;../../bin/arch -s 4390 rdfweb-dev&quot;. 
Traceback (most recent call last):
  File &quot;../../bin/arch&quot;, line 187, in ?
    main()
  File &quot;../../bin/arch&quot;, line 177, in main
    archiver.close()
  File &quot;/usr/local/mailman/Mailman/Archiver/pipermail.py&quot;,
line 310, in close
    self.write_TOC()
  File &quot;/usr/local/mailman/Mailman/Archiver/HyperArch.py&quot;,
line 1082, in write_TOC
    rss.write(self.RSS())
  File &quot;/usr/local/mailman/Mailman/Archiver/HyperArch.py&quot;,
line 769, in RSS
    date, msgid = self.database.dateIndex.first()
AttributeError: HyperDatabase instance has no attribute
'dateIndex'

Not sure what's going on there, but this seemed as good a
place of any to keep note of it.

Investigating...



----------------------------------------------------------------------

Comment By: Dan Brickley (danbri)
Date: 2003-06-22 11:04

Message:
Logged In: YES 
user_id=7830

Does anyone have a patch to remove the hardwiring of
&quot;2002-December&quot; and get the appropriate date from mailman
somehow?

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2003-04-18 22:43

Message:
Logged In: YES 
user_id=12800

Andrew, to get the url for the archived message use
mlist.GetBaseArchiveURL(), which knows about private vs.
public archives, the host name and the list name.   From
there you should be able to tack on just the part of the
path under &quot;archives/private/listname&quot;.  See
Mailman/Handlers/Scrubber.py for an example.

Only other minor comment: NUM_ARTICLES can probably go in
Defaults.py.in



----------------------------------------------------------------------

Comment By: Justin Mason (jmason)
Date: 2003-03-26 21:49

Message:
Logged In: YES 
user_id=935

big thumbs up from me too.  Much better solution than
http://taint.org/mmrss/ ;)

----------------------------------------------------------------------

Comment By: Uche Ogbuji (uche)
Date: 2003-03-18 01:09

Message:
Logged In: YES 
user_id=38966

I'd like to add my vote to this item.  This is a fantastic
idea, Andrew.  Thanks.

--Uche


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2002-12-23 20:42

Message:
Logged In: YES 
user_id=11375

Updated patch:

* Dates are now rendered as ISO-8601 (date only, not the time of the message)

* By hard-wiring 2002-December, I got the RSS to validate using Mark Pilgrim's validator.


----------------------------------------------------------------------

Comment By: captain larry (captainlarry)
Date: 2002-12-23 19:36

Message:
Logged In: YES 
user_id=147905

Just voting for support here.  This is *great* thanks for
the patch and I hope the maintainers include it as soon as
it's appropriate :)

Adam.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-12-23 19:27

Message:
Logged In: YES 
user_id=12800

Deferring until post-2.1

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2002-12-23 19:21

Message:
Logged In: YES 
user_id=11375

Argh; SF choked on the file upload.  Attaching the patch again...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=300103&aid=657951&group_id=103


More information about the Mailman-coders mailing list