data:image/s3,"s3://crabby-images/a5cfc/a5cfc6ec17d8876b3f96133741b48c3f75c18f1f" alt=""
On one mailing list I'm seeing alot of attachment files building up in the "archive" directory:
# pwd /var/lib/mailman/archives/private # ls listname* listname: attachments index.html
listname.mbox:
Looking down a couple directory levels in "attachements": # pwd /var/lib/mailman/archives/private/<listname>/attachments/20091116/4763b4e9 # file * attachment.obj: gzip compressed data, was "ErrorReport.21234.txt", from Unix, last modified: Mon Nov 16 11:03:59 2009 [root@albers 4763b4e9]# ls attachment.obj
Basically there are many directories in the "attachments" directory that go from "20070808" to "20091116".
I'm running 2.1.9. I just need to know how to cleanup and turnoff if possible.
Thanks! Troy
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
Troy Campbell wrote:
On one mailing list I'm seeing alot of attachment files building up in the "archive" directory: [...]
Basically there are many directories in the "attachments" directory that go from "20070808" to "20091116".
I'm running 2.1.9. I just need to know how to cleanup and turnoff if possible.
It is Scrubber.py that saves these. Depending on settings, you may get one or two copies of each attachment which is either not text/plain or text/plain with an unknown character set.
If the list's Non-digest options -> scrub-nondigest is Yes, you will get one saved attachment when the attachment is removed from the message and replaced by a link to the saved attachment. Otherwise, you get two. One when the attachment is scrubbed for the archive and one when the attachment is scrubbed from the plain format digest.
You can avoid almost all of this by removing all non-plain text with content filtering.
If you don't remove them with content filtering, you can avoid the 'digest copies' by setting Digest options -> digestable to No. You can avoid the 'archive copies' by turning off archiving for the list.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/a5cfc/a5cfc6ec17d8876b3f96133741b48c3f75c18f1f" alt=""
Thanks Mark,
The list's Non-digest options -> scrub-nondigest is No.
In the Content Filtering ->"Details for pass_mime_types" field I show the following:
multipart/mixed multipart/alternative text/plain
Would that be sufficient to do what you are suggesting if I turn "Edit filter_content" on?
Could I then remove the "attachments" subdirectories?
Regards, Troy
on 11/16/2009 07:23 PM Mark Sapiro said the following:
Troy Campbell wrote:
On one mailing list I'm seeing alot of attachment files building up in the "archive" directory:
[...]
Basically there are many directories in the "attachments" directory that go from "20070808" to "20091116".
I'm running 2.1.9. I just need to know how to cleanup and turnoff if possible.
It is Scrubber.py that saves these. Depending on settings, you may get one or two copies of each attachment which is either not text/plain or text/plain with an unknown character set.
If the list's Non-digest options -> scrub-nondigest is Yes, you will get one saved attachment when the attachment is removed from the message and replaced by a link to the saved attachment. Otherwise, you get two. One when the attachment is scrubbed for the archive and one when the attachment is scrubbed from the plain format digest.
You can avoid almost all of this by removing all non-plain text with content filtering.
If you don't remove them with content filtering, you can avoid the 'digest copies' by setting Digest options -> digestable to No. You can avoid the 'archive copies' by turning off archiving for the list.
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
Troy Campbell wrote:
In the Content Filtering ->"Details for pass_mime_types" field I show the following:
multipart/mixed multipart/alternative text/plain
Would that be sufficient to do what you are suggesting if I turn "Edit filter_content" on?
I suggest the following in pass_mime_types
multipart message/rfc822 text/html text/plain
plus collapse_alternatives and convert_html_to_plaintext = Yes
This will allow the sub-parts of any multipart message including multipart/related and multipart/signed to be examined. It will also allow plain text (and HTML) from attached messages and will ultimately discard all but the first alternative from multipart/alternative and convert any remaining HTML to plain text.
This will allow very little that will ultimately be scrubbed. Only text/plain attachments with unspecified character sets.
Could I then remove the "attachments" subdirectories?
You can remove the attachments directories anyway. They will be recreated if needed. The problem with removal is there are messages in the HTML archive with links to scrubbed attachments and if you remove the directory or files, you break the links. Whether or not this is important is up to you.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Mark Sapiro
-
Troy Campbell