Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

Thanks for your feedback Aurelien.
we'll need something like a task queue and a daemon process or a cron job
In my proposal I suggested using any of several asynchronous job queue libraries, such as Celery or Huey. These all use redis as a back-end. Because I have no experience with asynchronous job queues, I'm not sure if this is too much baggage for our purposes. Maybe we just don't want the extra dependencies. Regarding cron jobs, there's also django-background-task <https://github.com/lilspikey/django-background-task>, which is a simple django addon that might do what we need. Again, if we don't want/need the extra dependency, rolling our own cron job should be fairly straight-forward.
If we choose to pre-build the mbox files, we can't simply have them served through the webserver, because some lists are private
Then there is also an authentication step? I noticed on the test server that I can't actually look at any of the mailing lists because they're all private. So we should be able to use pre-existing code for this step?
with possible attachments, we may be creating hundred of megabytes or maybe gigabytes of data
When we create the mbox file, do we simply note that an attachment existed (e.g. "Attachment: myattachment.txt") or do we actually put the attachment in the mbox? AFAIK mbox is a plaintext format, so if the latter is the case then I'm not exactly sure how this would work...
Are there going to be any issues handling unicode foreign characters or with file locks? Right now it looks like we should only have one process handling the mbox, but is it possible that more than one could be spawned somehow?
Another possible "nice-to-have" feature I thought of yesterday is a download link that scripts can use to get archives (e.g. "/download?year=x&month=y"). On the other hand, maybe this is just a security risk that has no actual use case, but I'd still like to have a second opinion on this.
Additionally, here are some tentative weekly goals I have for the project. Feedback on the order/plausibility of these would be awesome!
Week 1) Given an email message, the message headers and body are extracted and stored in a local file in mbox format. All unit tests passing. Week 2) Attachments are represented in the mbox file as well. Email addresses are escaped. There are no encoding errors (no boxes or ?s). All unit tests passing. Week 3) Explore options for possible asynchronous queue managers. Weeks 4-5) When a mailing archive is created, a background process (implemented using chosen backend) is attached to it for managing its mbox files. Existing processes are started when the server starts, and the server can efficiently manage all of these (possibly tens/hundreds?) of tasks. All unit tests passing. Week 6) Clean code and tests before midterm review. All unit tests passing. Week 7-8) Each background process unzips two mbox files, one for the entire list and one for the past month, adds any messages that have come in in the past hour (in mbox format) and rezips the archive. All unit tests passing. Week 9-10) Mbox archives are served by hyperkitty upon request. Hyperkitty does not at this point authenticate users. All unit tests passing. Week 11) Hyperkitty authenticates the user before serving the mbox request. If the request is denied, the user is notified via the UI. All unit tests passing. Week 12) Code review and cleaning, final check on unit tests (they should all be passing).
Thanks, David
On Wed, Mar 25, 2015 at 4:18 AM, Aurelien Bompard <aurelien@bompard.org> wrote:
participants (1)
-
David Udelson