[Mailman-Users] Archives disk space: are .txt files needed?

Mark Sapiro msapiro at value.net
Sat Feb 12 02:54:01 CET 2005


Mike Alberghini wrote:
>
>The archive directories contain each months mail in three formats:
>
>1.  a plaintext file:  2004-November.txt
>2.  a gzipped file:    2004-November.txt.gz
>3.  a directory:       2004-November - contains individual HTML messages.
>
>The web archive uses the files in the directory, and links to the gzipped
>file.  Does anything use the plaintext file?  It seems like it's wasting a
>ton of diskspace having the same file gzipped and unzipped in the same space.

How the .txt file is used depends on the setting of
GZIP_ARCHIVE_TXT_FILES in mm_cfg.py. If this is set to Yes, the .txt
file only exists temporarily while the archiver unzips the .txt.gz and
appends the .txt into a new .txt.gz. With this setting, there are no
permanent .txt files, but this is a very inefficient process (see
comments in Defaults.py).

If GZIP_ARCHIVE_TXT_FILES is No, then the archive is accumulated in the
.txt file and is gzip'd by a nightly cron. In this case, the .txt
files can be deleted for prior months if no new messages ever arrive
for that month. This can't always be guaranteed as a message could be
delayed in transit or have a bad date. In general though, old .txt
files can be deleted, and if a "late" message did arrive and cause
loss of the .txt.gz information, the archive could be rebuilt from the
<list>.mbox/<list>.mbox file with bin/arch.

>So, first off, can I delete the year-month.txt files without causing harm?

Generally, yes after the month is over.

>Second, once the current month is over, can I prevent the non-zipped files
>from ever existing?

You can set

GZIP_ARCHIVE_TXT_FILES - Yes

in mm_cfg.py if you're willing to live with the additional processing
to unzip/rezip the .txt.gz file for each message.

>Finally, is there a way to prevent the archiving of
>attachments?

If you don't want to use content filtering to keep them off the list
entirely, then I think it would require a somewhat tricky hack. You
could modify the code in Mailman/Handlers/Scrubber.py, but this would
also affect digests - that's where it gets tricky.

--
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan




More information about the Mailman-Users mailing list