Automating Mailman Archive Maintenance
Hello,
I am in the process of trying to automate our Mailman Archive maintenance before it gets unruly. I looked in the FAQ and wiki for information and found some about rebuilding the archives (which will be handy) but nothing about automating it.
The assumptions I am working under: prefix>/<listname> (to be called DIR-A)
- the html files for the archives are located in <some
- the directory the mbox file to rebuild the archive html files are in <some prefix>/<listname>.mbox (to be called DIR-B)
- our automated process will process the mbox files in DIR-B and delete completely or mark for deletion any messages older than a given timeframe.
Now, the questions: the file or does it get added through some interface?
If I run bin/arch --wipe <listname> to rebuild the archives for <listname>, do I have to delete the files in DIR-A first or will bin/arch do it?
When a message is added to the mbox file in DIR-B, is it appended to
When a message is added to the mbox file in DIR-B, are any existing messages that are marked for deletion removed or is the message just added to the mbox file?
When bin/arch is run and builds the html files, does it ignore messages marked for deletion or does it add the message to the html files no matter how it is marked?
Should Mailman be shutdown prior to running my automated process, which includes running bin/arch, or can I leave Mailman running?
In our installation, the public archives directory for each list is a link to the private archives directory for each list, is that the standard or should I be prepared to see some archives in the public area and other in the private depending on the particular list's setting?
Is there any other gotcha I should watch out for when using an automated process?
Thanks in advance, Chris
C Nulk wrote:
I am in the process of trying to automate our Mailman Archive maintenance before it gets unruly. I looked in the FAQ and wiki for information and found some about rebuilding the archives (which will be handy) but nothing about automating it.
The assumptions I am working under: prefix>/<listname> (to be called DIR-A)
- the html files for the archives are located in <some
- the directory the mbox file to rebuild the archive html files are in <some prefix>/<listname>.mbox (to be called DIR-B)
- our automated process will process the mbox files in DIR-B and delete completely or mark for deletion any messages older than a given timeframe.
Now, the questions:
- If I run bin/arch --wipe <listname> to rebuild the archives for <listname>, do I have to delete the files in DIR-A first or will bin/arch do it?
You do not have to delete any DIR-A files. That's what the --wipe option does.
- When a message is added to the mbox file in DIR-B, is it appended to the file or does it get added through some interface?
It is appended.
- When a message is added to the mbox file in DIR-B, are any existing messages that are marked for deletion removed or is the message just added to the mbox file?
It is just appended by a file open and append operation. The process does not in any way emulate an MDA or any IMAP or other mail access type process.
- When bin/arch is run and builds the html files, does it ignore messages marked for deletion or does it add the message to the html files no matter how it is marked?
It totally ignores any message status type headers.
- Should Mailman be shutdown prior to running my automated process, which includes running bin/arch, or can I leave Mailman running?
It's OK for Mailman to be running. There are archive locks that will prevent concurrent updates.
- In our installation, the public archives directory for each list is a link to the private archives directory for each list, is that the standard or should I be prepared to see some archives in the public area and other in the private depending on the particular list's setting?
All archive data is in archives/private/. archives/public/ contains only symlinks.
- Is there any other gotcha I should watch out for when using an automated process?
I don't think so.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thank you very much, Mark. I appreciate the information and help.
I have written a small php app which uses the imap libraries to open and process the mbox files. Given a time frame, the app will mark for deletion or remove any messages older than the time frame. Now to write a script to call bin/arch for each of the lists I process with my php app. I think I will still shutdown mailman when I run the php app since I don't have any way to check on any locks. Afterwards, I can restart Mailman, then call the script to rebuild the archives.
The combination of the two apps/scripts will allow us to automate removing old archives. Run once a month for marking old messages, then once a year to remove marked messages. We can then keep a running 2 - 3 year archive going.
Thanks again, Chris
On 5/20/2011 8:44 PM, Mark Sapiro wrote:
C Nulk wrote:
I am in the process of trying to automate our Mailman Archive maintenance before it gets unruly. I looked in the FAQ and wiki for information and found some about rebuilding the archives (which will be handy) but nothing about automating it.
The assumptions I am working under: prefix>/<listname> (to be called DIR-A)
- the html files for the archives are located in <some
- the directory the mbox file to rebuild the archive html files are in <some prefix>/<listname>.mbox (to be called DIR-B)
- our automated process will process the mbox files in DIR-B and delete completely or mark for deletion any messages older than a given timeframe.
Now, the questions:
- If I run bin/arch --wipe <listname> to rebuild the archives for <listname>, do I have to delete the files in DIR-A first or will bin/arch do it?
You do not have to delete any DIR-A files. That's what the --wipe option does.
- When a message is added to the mbox file in DIR-B, is it appended to the file or does it get added through some interface?
It is appended.
- When a message is added to the mbox file in DIR-B, are any existing messages that are marked for deletion removed or is the message just added to the mbox file?
It is just appended by a file open and append operation. The process does not in any way emulate an MDA or any IMAP or other mail access type process.
- When bin/arch is run and builds the html files, does it ignore messages marked for deletion or does it add the message to the html files no matter how it is marked?
It totally ignores any message status type headers.
- Should Mailman be shutdown prior to running my automated process, which includes running bin/arch, or can I leave Mailman running?
It's OK for Mailman to be running. There are archive locks that will prevent concurrent updates.
- In our installation, the public archives directory for each list is a link to the private archives directory for each list, is that the standard or should I be prepared to see some archives in the public area and other in the private depending on the particular list's setting?
All archive data is in archives/private/. archives/public/ contains only symlinks.
- Is there any other gotcha I should watch out for when using an automated process?
I don't think so.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
C Nulk wrote:
I have written a small php app which uses the imap libraries to open and process the mbox files. Given a time frame, the app will mark for deletion or remove any messages older than the time frame. Now to write a script to call bin/arch for each of the lists I process with my php app. I think I will still shutdown mailman when I run the php app since I don't have any way to check on any locks. Afterwards, I can restart Mailman, then call the script to rebuild the archives.
That should work fine, but if you wrote a Python script, it could use Mailman list methods to manage the archive locking and use either Python's imaplib or mailbox modules to actually manipulate messages in the .mbox.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
I agree with you, Mark. Unfortunately, as you may know, my Python skills aren't the best. I am working on little bits of improvements for us here as time permits so I use the tools with which I am most familiar.
Since we are using Mailman v2.1.9, we are a bit behind where Mailman currently sits. Your gracious help assisted us with the LDAP plus I have made some custom modifications - one you help with was for "Special Posters" and a set of changes to add more logging information to the log files. At some point, we will be migrating to at least v2.1.12 or later, so I need to look at incorporating my mods into the later code. And very little free time to do it.
Thanks again for you help, Chris
P.S. While my automation php app isn't the best written thing in the world, if anyone wants a copy to use as a starting point for conversion to a Python script, let me know.
On 5/23/2011 3:32 PM, Mark Sapiro wrote:
C Nulk wrote:
I have written a small php app which uses the imap libraries to open and process the mbox files. Given a time frame, the app will mark for deletion or remove any messages older than the time frame. Now to write a script to call bin/arch for each of the lists I process with my php app. I think I will still shutdown mailman when I run the php app since I don't have any way to check on any locks. Afterwards, I can restart Mailman, then call the script to rebuild the archives.
That should work fine, but if you wrote a Python script, it could use Mailman list methods to manage the archive locking and use either Python's imaplib or mailbox modules to actually manipulate messages in the .mbox.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
C Nulk -
Mark Sapiro