[Mailman-Users] Question About Gzip'd Archives

Mark Sapiro mark at msapiro.net
Thu Sep 3 23:46:36 CEST 2009


Barry Finkel wrote:

>I have a question about zipped list archives; the question arose from
>a subscriber to one of our lists.  I am running Mailman 2.1.11 on
>Ubuntu from a package I built from the SourceForge source.
>
>mailman# pwd
>/var/lib/mailman/archives/private/LISTNAME
>mailman# ls -ald 2009-August*
>drwxrwsr-x 2 list list  4096 2009-08-31 11:34 2009-August
>-rw-rw-r-- 1 list list 91577 2009-08-31 11:34 2009-August.txt
>-rw-rw-r-- 1 list list 20708 2009-09-01 03:27 2009-August.txt.gz
>mailman#
>
>The .txt file looks fine, as does the .gz file.
>When I go to the list admin web interface and look at the archives,
>I see
>
>     August 2009: [Thread] [Subject] [Author] [Date] [GZip'd text 20KB]
>
>That value (20KB) seems to be correct.  When I click on the "[Gzip...]"
>link, Firefox/Solaris gives me a text file, not a .gz file.  Maybe
>Firefox knows how to unzip the file, as vim does.  When I click on
>the same link using IE8/XP, IE8 sees the .gz suffix and asks me what
>to do with the file.  I save it on my desktop, and when I look at the
>file, I see that it is a plain text file.  It is not a gzip'd file.
>Why?  Thanks.


Your web server is converting the gzipped file and serving it as plain
text, but MSIE sees the .gz extension and thinks it can't display the
content.

However, I recommend you don't gzip the files at all. As you can see,
doing so doesn't save space; it requires more space because the .txt
files are kept even after gzipping. The old ones that will have no
more messages added can be removed, but you have to do that manually.

Keeping a gzipped file can save some bandwidth when accessing the file
on the web, but not if your web server converts and serves it as plain
text, which appears to be the case.

Also, unless you set GZIP_ARCHIVE_TXT_FILES = Yes in mm_cfg.py (don't
do it see below), the current day's posts are not in the .txt.gz file
until cron runs Mailman's cron/nightly_gzip.

Thus, I recommend not gzipping the archive .txt files at all. I.e., do
not put GZIP_ARCHIVE_TXT_FILES = Yes in mm_cfg.py and remove or
comment the cron/nightly_gzip entry from Mailman's crontab.

This can be a bit tricky to do right because you have links on the
archive TOC page that point to the .txt.gz files, and if you just
comment the cron/nightly_gzip entry, the current period's .txt.gz file
will be quickly out of date.

You can remove all the .txt.gz files, and the next archived post will
rebuild the TOC with links to the .txt files, but for the period
before the next archived post, the archive TOC will have links
pointing to the removed .txt.gz files.

One way around this is just to run bin/arch --wipe on a list or lists.
This will remove all the list's .txt.gz files and build an archive TOC
with correct links to the .txt files. The .txt.gz files will only be
regenerated if cron/nightly_gzip is run. The usual caveats about
running bin/arch --wipe, especially on older lists, apply. Namely,
it's a good idea to first check the
archives/private/LIST.mbox/LIST.mbox file with bin/cleanarch, and
there is a possibility that messages can get renumbered which
invalidates externally saved links to exisitng messages.

Another way around it is to remove the .txt.gz files manually and then
run  'bin/arch LISTNAME /dev/null' to rebuild the archive TOC. Note no
--wipe option and no input redirection - just /dev/null as a filename
argument.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list