[Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

Aurelien Bompard aurelien at bompard.org
Wed Mar 25 09:18:49 CET 2015


Hey David, here are my thoughs on the challenges:

> 1) Determine which messages to include in the mbox.
>     An entire list archive is clearly one choice, but is there also
> interest in generating mbox files for specific threads, list archieves
> between specific dates, etc.?

Hmm, depending on the architecture we choose, we may not have a lot of
options. I'd like to see at least "whole-list" and "last 30 days"
archives though, this last one being useful to those who want to use
their mail client and "seed" it with the latest discussion to reply
in-thread.

> 2) For each message, append plaintext to mbox file.
>     Is this the part where we risk "blocking the UI"? Certainly for
> hundreds of thousands of messages, this will be a computationally intensive
> step, so will this have to be run in a separate thread?

Yeah, with a lot of messages, and with possible attachments, we may be
creating hundred of megabytes or maybe gigabytes of data. This has to
be done outside of the webserver process, so we'll need something like
a task queue and a daemon process or a cron job. Or we could be
building and appending to the mbox files when new messages arrive,
which would take up more disk space but would be more fluid from a UI
point of view. It would also probably be much more resource-intensive
than a cron job, because the mbox files will be large and should be
gzipped, so it would be better to append a batch of emails than
opening and closing on each incoming email.
I'm leaning towards pre-rendering the mbox files in a regular cron job
and warning the user in the UI that the archive contains all email up
to the last hour, for example.
We can't use the prototype archiver because we need to filter the
messages content and escape email adresses to protect from spam
harvesters, like MM2.1 currently does.

> 3) Present mbox file to user for download.
>     I'm hoping this is a trivial step, but I'm not sure about some of the
> specifics. For example, is Hyperkitty only able to run on apache, or is the
> choice of web server entirely up to the web admin? How we ultimately serve
> the file will depend on these details.

HyperKitty runs on Django, which can be served by whichever
WSGI-compliant server the admin chooses (Apache's mod_wsgi, uWSGI,
gunicorn, etc.). If we choose to pre-build the mbox files, we can't
simply have them served through the webserver, because some lists are
private (only available to subscribers).

I hope that clearifies a bit.

Aurélien


More information about the Mailman-Developers mailing list