Archive Issue -- prune_arch doesn't remove detached attachments?

Today I issued "prune_arch -l flohmarkt -d 30" and wondered about the immense size of the corresponding archive directory archives/private/flohmarkt
du quickly showed me why:
# du -s * | sort -n 4 index.html 4 pipermail.pck 84 2020-November.txt 296 2020-December.txt 332 2020-November 460 database 1168 2020-December 17280164 attachments
Shouldn't prune_arch also clean out the "attachments" directory? (mailman-2.1.34)
Ralf Hildebrandt Charité - Universitätsmedizin Berlin Geschäftsbereich IT | Abteilung Netzwerk
Campus Benjamin Franklin (CBF) Haus I | 1. OG | Raum 105 Hindenburgdamm 30 | D-12203 Berlin
Tel. +49 30 450 570 155 ralf.hildebrandt@charite.de https://www.charite.de

Ralf Hildebrandt writes:
Today I issued "prune_arch -l flohmarkt -d 30" and wondered about the immense size of the corresponding archive directory archives/private/flohmarkt
Shouldn't prune_arch also clean out the "attachments" directory? (mailman-2.1.34)
Probably, but that's up to Mark (and I'm pretty sure that's one of his personally-maintained scripts rather than one distributed by Mailman).

Hi,
I'm observed this behavior too. To solve, I used a script on cron, like this, to maintain 2 years of archive:
[...]
year=date --date='2 years ago' +%Y
month=date --date='last month' +%m
data_corte="${ano}${mes}"
path_private='/usr/local/mailman/archives/private'
find "${path_private}" -depth -regextype posix-egrep -regex ".*/attachments/${data_corte}.*/.*" -delete
Running this script monthly solve my problem of space.
Best Regards,
Juliano Alves Guidini Analista de Sistemas USP - STI - CeTI-SP - DVTIN - SCTIN - SCTS
Em ter., 22 de dez. de 2020 às 11:49, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> escreveu:
Ralf Hildebrandt writes:
Today I issued "prune_arch -l flohmarkt -d 30" and wondered about the immense size of the corresponding archive directory archives/private/flohmarkt
Shouldn't prune_arch also clean out the "attachments" directory? (mailman-2.1.34)
Probably, but that's up to Mark (and I'm pretty sure that's one of his personally-maintained scripts rather than one distributed by Mailman).
Mailman-Users mailing list -- mailman-users@python.org To unsubscribe send an email to mailman-users-leave@python.org https://mail.python.org/mailman3/lists/mailman-users.python.org/ Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/ https://mail.python.org/archives/list/mailman-users@python.org/

On 12/22/20 6:48 AM, Stephen J. Turnbull wrote:
Ralf Hildebrandt writes:
Today I issued "prune_arch -l flohmarkt -d 30" and wondered about the immense size of the corresponding archive directory archives/private/flohmarkt
Shouldn't prune_arch also clean out the "attachments" directory? (mailman-2.1.34)
Probably, but that's up to Mark (and I'm pretty sure that's one of his personally-maintained scripts rather than one distributed by Mailman).
This is tricky. Prune_arch without the -n/--nobuild option will rebuild
the archive from the pruned mbox file with bin/arch --wipe
. The issue
is that in the case where the list's scrub_nondigest setting is Yes,
bin/arch --wipe
preserves the old attachments/ directory because it
contains scrubbed attachments which are not in the mbox file. I.e., the
mbox messages have only links to the attachments in the attachments/
directory.
The attachments directory contains subdirectories of the form yyyymmdd so it would be possible for prune_arch to determine which ones should also be pruned. I can look into adding that, although it probably should be optional because in the case where prune_arch is doing -b/--backup or -p/--preserve, the only place where the scrubbed attachments exist is in those attachments/ directories and removing those directories will result in possibly unintended loss of information. I.e., the backed up or pruned mbox will contain links to attachments which will be broken if the attachments are removed.
My current thinking is if the list's scrub_nondigest setting is Yes, remove the "pruned" attachments and if either -b/--backup or -p/--preserve is specified, backup/preserve them too.
Other thoughts are welcome.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 12/22/20 8:35 AM, Mark Sapiro wrote:
My current thinking is if the list's scrub_nondigest setting is Yes, remove the "pruned" attachments and if either -b/--backup or -p/--preserve is specified, backup/preserve them too.
Other thoughts are welcome.
I've updated the script at <https://www.msapiro.net/scripts/prune_arch> and <https://fog.ccsf.edu/~msapiro/scripts/prune_arch>. It does essentially what I say above. In Ralf's "prune_arch -l flohmarkt -d 30" case, it will just remove all the archives/private/flohmarkt/attachment/yyyymmdd directories older than 30 days in addition to removing the older archived messages.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Juliano Alves Guidini
-
Mark Sapiro
-
Ralf Hildebrandt
-
Stephen J. Turnbull