Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty
In my proposal I suggested using any of several asynchronous job queue libraries, such as Celery or Huey. These all use redis as a back-end. Because I have no experience with asynchronous job queues, I'm not sure if this is too much baggage for our purposes. Maybe we just don't want the extra dependencies.
Yeah, we don't want to add another database or an AMQP server just for that. We must keep it simple for admins to deploy.
Regarding cron jobs, there's also django-background-task which is a simple django addon that might do what we need. Again, if we don't want/need the extra dependency, rolling our own cron job should be fairly straight-forward.
I'm already using the "jobs" infrastructure provided by the django-extensions package: http://django-extensions.readthedocs.org/en/latest/jobs_scheduling.html I did consider django-background-task but django-extensions seemed like a better fit, because django-background-task seems written for delayed tasks, not periodic tasks (well, a task could call itself again when done, but it seems like a hack). I'm not opposed to switching to django-background-task if we use the "delayed job" feature or if we need the extra flexibility of choosing exactly how many seconds apart we want our tasks to run.
If we choose to pre-build the mbox files, we can't simply have them served through the webserver, because some lists are private
Then there is also an authentication step?
Yeah, we must use HyperKitty's authentication and check if the user is allowed to see the archive. So the files can't be served by the webserver like static files.
I noticed on the test server that I can't actually look at any of the mailing lists because they're all private.
If you're looking at lists.stg.fedoraproject.org, it's currently very outdated (still running the Python2-compatible branch of Mailman 3). I have another test server with more current info if you want, but I break it regularly. It's lists-dev.cloud.fedoraproject.org
When we create the mbox file, do we simply note that an attachment existed (e.g. "Attachment: myattachment.txt") or do we actually put the attachment in the mbox? AFAIK mbox is a plaintext format, so if the latter is the case then I'm not exactly sure how this would work...
We do put the attachment in the mbox, as a MIME component like in every email. If you choose "view source" when looking at an email with attachments, you'll see how it's done.
Are there going to be any issues handling unicode foreign characters or with file locks? Right now it looks like we should only have one process handling the mbox, but is it possible that more than one could be spawned somehow?
No, mbox files are not designed for concurrent writes, so it's better to have a single process write to them.
Another possible "nice-to-have" feature I thought of yesterday is a download link that scripts can use to get archives (e.g. "/download?year=x&month=y"). On the other hand, maybe this is just a security risk that has no actual use case, but I'd still like to have a second opinion on this.
Well, there still is the authentication issue.
Aurélien
participants (1)
-
Aurelien Bompard