A couple of archiving questions
Can anyone point me in the direction of a good explanation of how archiving works? I’ve found bits and pieces but the picture isn’t totally clear - it appears that archiving requires some sort of archive handler function to be written and installed as part of Mailman?
Is there any reason why it should not be possible to be able to pull messages scheduled for archiving out through the REST API as file attachments?
thanks
Hello Andrew
I may not be the best person to answer this but I will try to explain my views, from the bug https://bugs.launchpad.net/postorius/+bug/987100 I started looking at archivers and I talked to Florain and Abhilash. The bug says that the tri-state for archiving is yet not available in postorius GUI however we can control it from mailman configuration option. Florain mentioned that the available archivers are exposed in the REST API. So it should be possible to choose one (or more) of them in postorius. In /mailman/src/mailman/archiving/docs/common.rst ; It says we have Mail-Archive and MHonArc archivers available however there are some issues with the non-public lists.
So to specifically answer your first question, http://gnu-mailman.readthedocs.org/en/latest/src/mailman/rest/docs/lists.htm... would be a good start and archivers are already available in default mailman installation.
I think I too need an answer for your second question and if the links mentioned above can help you out with that, please do let me know as well.
On Mon, Feb 16, 2015 at 4:29 AM, Andrew Stuart < andrew.stuart@supercoders.com.au> wrote:
Can anyone point me in the direction of a good explanation of how archiving works? I’ve found bits and pieces but the picture isn’t totally clear - it appears that archiving requires some sort of archive handler function to be written and installed as part of Mailman?
Is there any reason why it should not be possible to be able to pull messages scheduled for archiving out through the REST API as file attachments?
thanks
Mailman-Developers mailing list Mailman-Developers@python.org https://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-developers/godricglow%40gmai...
Security Policy: http://wiki.list.org/x/QIA9
-- *Pranjal Yadav*
Andrew Stuart writes:
Can anyone point me in the direction of a good explanation of how archiving works?
There isn't one. Mailman core doesn't define an archiving function and current opinion is that it shouldn't. It just defines an interface for attaching the archiver or archivers you prefer to the system.
The main reason there actually *is* a defined interface is so that core can provide the permalink for use in a header field. Otherwise integrating an archiver is trivial (on the Mailman core side) and archiver-specific.
Is there any reason why it should not be possible to be able to pull messages scheduled for archiving out through the REST API as file attachments?
Sure. There's no reason to suppose that the messages are available to the REST API. One popular scheme involves subscribing a list-specific address at mail-archive.com, which is a third-party service. The messages come in, the messages go out, and they leave no trace of content behind, just log entries.
Steve's explained most of the current thinking on archiving, and as you observe, archiving is a push interface (from Mailman to the archivers).
At one point I thought about having a sort of built-in archiver that wasn't any smarter than just an maildir or some other dump on-disk dump. Mailman 2 has its mbox files and Mailman 3 has its IMessageStore interface, but we don't use it for much. What we should probably do add the message to the message store once it's been approved for posting. Someone could probably write that as an IHandler in about 5 minutes.
(Messages held for moderation are added to the store, and held messages are available through the moderation REST API, but those are for different functionality. The work to reject duplicate Message-IDs involves adding the messages to the store *before* they're approved.)
Once you have that, then you could pretty easily add generic access to the message store in the REST API. What you'd need is some way to know *which* messages to pull. That's when things get complicated and tricky I think.
Cheers, -Barry
Andrew Stuart writes:
Is there any reason why it should not be possible to be able to pull messages scheduled for archiving out through the REST API as file attachments?
Oops, my previous message missed the subtlety that you do have the message at the point in time you're considering. You still can't do archiving with REST: REST is a "pull" interface, archiving is a "push" operation.
Why would you want to access "messages scheduled for archiving" via REST? I have trouble imagining a use case.
Why would you want to access "messages scheduled for archiving” via REST? I have trouble imagining a use case.
It would make it pretty easy to write an archiver if all I had to do is poll via the REST API for new messages waiting to be archived whenever I feel like it and put them somewhere. Clearly that’s not going to work if the emails are being shunted out the door to some other archiver via the existing archiving interface. But if that could be addressed somehow, REST API access would make life pretty easy for dealing with archive messages - mainly I’d like to avoid touching the Mailman core or write anything that needs installation into Mailman.
I’m not sure exactly what I have to do to write something that interfaces to the core archiver so I’m not sure how easy or hard that would be.
as
On 16 Feb 2015, at 12:57 pm, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Andrew Stuart writes:
Is there any reason why it should not be possible to be able to pull messages scheduled for archiving out through the REST API as file attachments?
Oops, my previous message missed the subtlety that you do have the message at the point in time you're considering. You still can't do archiving with REST: REST is a "pull" interface, archiving is a "push" operation.
Why would you want to access "messages scheduled for archiving" via REST? I have trouble imagining a use case.
Andrew Stuart writes:
Why would you want to access "messages scheduled for archiving” via REST? I have trouble imagining a use case.
It would make it pretty easy to write an archiver if all I had to do is poll via the REST API for new messages waiting to be archived whenever I feel like it and put them somewhere.
It's no harder than that to write a Handler in Mailman 2, and IIRC the additional burden (adding the message-id-hash) in Mailman 3 is elsewhere in the pipeline. Here's the whole thing for MM 2:
import time from cStringIO import StringIO
from Mailman import mm_cfg from Mailman.Queue.sbcache import get_switchboard
def process(mlist, msg, msgdata): # short circuits if msgdata.get('isdigest') or not mlist.archive: return # Common practice seems to favor "X-No-Archive: yes". No other value for # this header seems to make sense, so we'll just test for it's presence. # I'm keeping "X-Archive: no" for backwards compatibility. if msg.has_key('x-no-archive') or msg.get('x-archive', '').lower() == 'no': return # Send the message to the archiver queue archq = get_switchboard(mm_cfg.ARCHQUEUE_DIR) # Send the message to the queue archq.enqueue(msg, msgdata)
*One* function of *six* lines after deleting comments (and ignoring imports), all of which you need to do on the archiver side of the REST interface anyway. Except the part about get the archiver queue object, but you'll need some equivalent *on the Mailman core side* to ensure that messages hang around until archived. So this is more complex than the current push design. And it also leaves you vulnerable to DoS'ing yourself if the polling process goes down and the queue fills your disk -- probably not a *big* issue, but one that needs a little thought at least to be sure it isn't. (The self-DoS problem is a non-problem for the current design, because if you are expecting to archive locally you probably do have the storage for it.)
Note that the "archq.enqueue()" in the above is semantically just "put them somewhere" (I'm quoting you). That's where the devilish details are in both Pipermail and in your abstract REST-pull-based archiver, not in interfacing with the core pipeline.
I suppose you could push the MM 2 archive queue (which I'm pretty sure currently is in Pipermail) into core in MM 3, and then you could use REST to pull the messages out, but really, I don't see a big gain except that you use REST for everything. But writing an archiver is not as simple as you seem to think (unless it just piles up the messages somewhere, and we already have that as example code in MM 3: mailman3/src/mailman/archiving/prototype.py -- hardly more complex than the MM 2 code above, and it actually does the work of storing the messages!)
Regards, Steve
OK looks pretty easy. Seems like I’ll just write an archiver if I want access to archived messages.
Can there only be one active archiver or multiple? Say I wanted all messages to go to a zip archiver as well as going to monharc or pipermail as well as my custom archive script.
as
On 16 Feb 2015, at 8:57 pm, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Andrew Stuart writes:
Why would you want to access "messages scheduled for archiving” via REST? I have trouble imagining a use case.
It would make it pretty easy to write an archiver if all I had to do is poll via the REST API for new messages waiting to be archived whenever I feel like it and put them somewhere.
It's no harder than that to write a Handler in Mailman 2, and IIRC the additional burden (adding the message-id-hash) in Mailman 3 is elsewhere in the pipeline. Here's the whole thing for MM 2:
import time from cStringIO import StringIO
from Mailman import mm_cfg from Mailman.Queue.sbcache import get_switchboard
def process(mlist, msg, msgdata): # short circuits if msgdata.get('isdigest') or not mlist.archive: return # Common practice seems to favor "X-No-Archive: yes". No other value for # this header seems to make sense, so we'll just test for it's presence. # I'm keeping "X-Archive: no" for backwards compatibility. if msg.has_key('x-no-archive') or msg.get('x-archive', '').lower() == 'no': return # Send the message to the archiver queue archq = get_switchboard(mm_cfg.ARCHQUEUE_DIR) # Send the message to the queue archq.enqueue(msg, msgdata)
*One* function of *six* lines after deleting comments (and ignoring imports), all of which you need to do on the archiver side of the REST interface anyway. Except the part about get the archiver queue object, but you'll need some equivalent *on the Mailman core side* to ensure that messages hang around until archived. So this is more complex than the current push design. And it also leaves you vulnerable to DoS'ing yourself if the polling process goes down and the queue fills your disk -- probably not a *big* issue, but one that needs a little thought at least to be sure it isn't. (The self-DoS problem is a non-problem for the current design, because if you are expecting to archive locally you probably do have the storage for it.)
Note that the "archq.enqueue()" in the above is semantically just "put them somewhere" (I'm quoting you). That's where the devilish details are in both Pipermail and in your abstract REST-pull-based archiver, not in interfacing with the core pipeline.
I suppose you could push the MM 2 archive queue (which I'm pretty sure currently is in Pipermail) into core in MM 3, and then you could use REST to pull the messages out, but really, I don't see a big gain except that you use REST for everything. But writing an archiver is not as simple as you seem to think (unless it just piles up the messages somewhere, and we already have that as example code in MM 3: mailman3/src/mailman/archiving/prototype.py -- hardly more complex than the MM 2 code above, and it actually does the work of storing the messages!)
Regards, Steve
On Feb 16, 2015, at 10:11 PM, Andrew Stuart wrote:
Can there only be one active archiver or multiple?
There are no limits on the number of IArchive implementations that can be registered with the system. Mailing lists themselves can enable any or all of the archivers registered with the system.
Cheers, -Barry
On Feb 16, 2015, at 06:57 PM, Stephen J. Turnbull wrote:
It's no harder than that to write a Handler in Mailman 2, and IIRC the additional burden (adding the message-id-hash) in Mailman 3 is elsewhere in the pipeline. Here's the whole thing for MM 2:
It's a little different in Mailman 3 because you don't write a handler, you write an IArchiver implementation. There are three examples in the source tree so it should be pretty easy to figure out.
(Yes, the M-I-H is calculate at LMTP time.)
Cheers, -Barry
participants (5)
-
Andrew Stuart
-
Barry Warsaw
-
Pranjal Yadav
-
Stephen J. Turnbull
-
Stephen J. Turnbull