Startled by the observation that Hyperkitty's unit tests to fail because
an excessive amount of file descriptors are opened, I began to dig a little.
The majority of these file descriptors are opened by "Django Q" -- a
django library for asynchronous tasks.
I wondered what these asynchronous tasks might be in an e-mail archiver.
After all, it has to perform 2 very specific jobs:
1. Receive e-mails via http requests, process them, and store them in a
2. Display the collected e-mails.
On top of that, it generates some statistics, includes a voting system,
and allows replying to e-mails in a forum-like fashion.
Job 1 really requires very little processing and is mostly database IO
as the message has to be sorted and its position in the thread has to be
None of this can be efficiently performed in the
background/asynchronously because there is nothing to be parallelized.
Performing the *whole* receive-and-store operation non-blocking also is
not an option -- ultimately mailman needs to know if it succeeded.
Job 2 is more or less what every web application does (reading from a
database and rendering the result) and certainly does not require any
So where is Django Q used in Hyperkitty? It all boils down to the file
hyperkitty/tasks.py where a smaller number of asynchronous tasks are
They can be grouped into 3 classes:
- query mailman core
- repair the data structure (empty threads, orphans, ...)
- rebuild query cache
I would like to argue that none of these 3 groups of tasks need to be
- Mailman-core only needs to be queried when mailing lists are changed.
This is triggered by signals from postorius and in addition periodically
by a cron job.
- The data structure should not need to be repaired in the first place
but the appropriate on_delete/on_save triggers should take care of this
-- and I believe they do in recent versions. If for some reason the
database becomes corrupted one can always start a repair operation.
Nothing is gained however, from running this asynchronously.
- Lastly the cache rebuild: Currently Hyperkitty rebuilds its cache
(which caches the db queries, not the frontend) whenever an e-mail is
received. Since it only involves *reading* from the db it actually is
something that *can* be done asynchronously to reduce the time it takes
to process an incoming e-mail.
But is it really worth the tremendous additional complexity that is
introduced by Django Q?
- requires a "qcluster" to run in the background (see shipped unit file)
- loss of determinism / debugging becomes much harder
- enourmous amount of file descriptors are opened in testing
- additional dependency
A similar result can be obtained by simply scaling the wsgi application
accordingly (if needed) and/or optimizing the db queries.
Alternatively one could simply invalidate the affected caches instead of
rebuild them every time an e-mail is received or don't trigger cache
rebuilds on received e-mails at all...
But maybe I overlooked something. I argue that we do not *need* Django
Q. The question is: do we want it?
What are your thoughts on this?
I am trying to run
tox -e py35-django111
on HyperKitty. I run into an issue in that the tests run up to a point
after which all but the last 2 fail with a "too many open files"
exception. These files are logs created in directories
/tmp/hyperkitty-testing-*, and over 100 /tmp/hyperkitty-testing-*
directories are left behind. This is with an open file limit of 1024.
If I raise the limit to 4096, everything runs and no
/tmp/hyperkitty-testing-* directories are left behind, but it seems
perhaps hyperkitty/tests/utils.TestCase._post_teardown() should have
something added to undo what hyperkitty/tests/utils.setup_logging() sets up.
Mark Sapiro <mark(a)msapiro.net> The highway is for gamblers,
San Francisco Bay Area, California better use your sense - B. Dylan
I have followed the instructions at https://github.com/maxking/docker-mailman and copied the config files for exim4 from the core/assets/exim directory to /etc/exim4/conf.d/main. However, when I run update-exim4.conf I get the following error;
2018-03-08 14:28:59 Exim configuration error in line 238 of /var/lib/exim4/config.autogenerated.tmp:
main option "mailman3_router" unknown
Invalid new configfile /var/lib/exim4/config.autogenerated.tmp, not installing
/var/lib/exim4/config.autogenerated.tmp to /var/lib/exim4/config.autogenerated
After a bit of investigation I noticed that the mailman3 router and transport configurations appear before the begin router and begin transport sections in the autogenerate file. Is this expected? If not, how can I fix this?