[Mailman-Users] distributing Mailman between 2 systems
Richard Barrett
r.barrett at openinfo.co.uk
Mon Jun 5 01:41:24 CEST 2006
On 4 Jun 2006, at 06:41, Jim Popovitch wrote:
> I would like to move the pipermail archives to a different host
> then the
> main Mailman system. Specifically for better archive searching
> performance with htdig. Is this possible?
>
> -Jim P.
How you approach this depends on what you perceive your problem to be
and what you mean by "better archive searching performance with htdig".
Like Google and other internet search engines, htdig splits the task
into two parts: index construction and index search.
Index construction does the heavy lifting of scanning the source
material and squirreling away in its indices a lot of detail of which
indexed source files contain what. This can be quite a slow process
especially when a large body of material has to be initially scanned
and indexed. It is probably best treated as a batch process run a
times of light load from other work on the system doing it. Depending
on the material concerned and how you configure htdig this indexing
may produce very large indices which can come close to being in the
same order of magnitude of storage size as the raw source material.
Many lists with large indices can generate demand for much CPU and
potentially much storage during indexing (and after in the case of
storage).
On the other hand index searching to produce a list of source files
that match the search criteria induces a much lower load on the
system concerned; after all it is just looking up words in pre-built
search indices.
The problem with this approach is that search indices are never
completely up-to-the-minute; but consider how often does Google's
crawler visit your web site. While updating search indices when new
documents are added to the archive material should be less load-
inducing than the original construction of the indices, configuring
cron jobs so that htdig rebuilds it indices too frequently is not
advisable. The updating of indices can still involve a lot of IO as
htdig walks a lot of files to determine which of the existing
material has been changed as well as what has been added as new.
So first you should define what problem you are trying to solve as
regards to using htdig before deciding what to do next.
You could plan on having your HTML mail archives integrated with
Mailman e.g. using pipermail or a pipermail/MHonArc synthesis for the
archive pages and having htdig integrated with that; I know you are
aware of the patches available to support this approach and that
there are some benefits as regard archive privacy being maintained
and such. I will deal with this integrated approach first. You could
deploy multiple processors to address the issues by using NFS to
share the mailman archive storage space between them.
Paranthetically, I successfully ran Mailman on x86 Linux boxes
entirely out of NFS mounted storage on enterprise level servers for a
number years, primarily to provide for rapid-ish switchover to a
backup server in the case of primary Mailman server hardware failure,
which happened on several occasions. At the time I found that I had
to limit NFS read/write transfer sizes on the Linux boxes to avoid
problems in the Linux kernel locking associated with the NFS
implementation then available. Nowadays I am running Mailman on
Solaris 10 which has no such problems but I guess the Linux' NFS
implementation has also improved in the meantime.
The simplest split you could consider is moving the htdig
installation and workload to a separate machine. The Mailman/htdig
integration patches support this configuration in conjunction with
NFS sharing of the Mailman archives files if you look at the
documentation here:
http://www.openinfo.co.uk/mm/patches/444884/install.html#rconfig
This configuration leaves one machine running Mailman and being
responsible for providing access to archive material while a second
machine does htdig's index maintenance. Mailman also "subcontracts"
each index search requested by a user to the htdig machine but the
URLs returned in the search results mean that the Mailman machines
delivers the material from the archives, not the htdig machine.
The question you asked was how to move the pipermail archives to
another system. Using NFS again, it might be possible to run some of
Mailman's qrunners on one machine and others (for example, the
archive runner) on a second to partition things but I have never had
the time or energy to set up systems to explore the issues of such a
configuration but somebody else may have pushed the envelope this
way. As an aside, I would avoid like the plague NFS cross-mounting of
volumes between machines in any configuration.
If you decide none of the above is appropriate to what you want to
achieve and the way you want to achieve it then you may be asking the
wrong question in my view. Maybe you should deploying a mailing list
archiving system independent of Mailman and you could do worse than
look at the model set by http://www.mail-archive.com, as a starting
point.
-----------------------------------------------------------------------
Richard Barrett http://www.openinfo.co.uk
More information about the Mailman-Users
mailing list