-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I experienced huge mailman queue processes, with ram usage of 700MB and more for each one of the six queue workers. I have several big mailing lists. One of about 180.000 subscriber, and other of about 70.000.
Debugging the issue I found:
Each queue worker touching a message will load the entire mailing list database in RAM. So, the RAM used by each worker is the sum of all mailing lists in the system (if all of them have traffic). This is a big issue if you have huge mailing lists.
The list data is keep in memory using a cache managed via weak references. But the cache is never evicted, so there is a hard reference out there, somewhere.
I found a memory reference cycle between a Mailing list and its OldStyleMemberships component, linked via "self._memberadaptor". This cycle keeps the mailing list alive and, so, the cache never evicted the data.
I changed the OldStyleMemberships constructor to:
""" class OldStyleMemberships(MemberAdaptor.MemberAdaptor): def __init__(self, mlist): import weakref self.__mlist = weakref.proxy(mlist) """
to keep only a weak reference to the mailing list, breaking the cycle.
Now, when a worker is done with a mailing list, the cache is correctly evicted.
Since python doesn't give back memory to system, the consequence of this change is:
Now, memory used by each worker is proportional to the size of the biggest mailing list, instead of the sum of all mailing list sizes. Not perfect, but a huge improvement is you have some big lists.
Now, since cache in evicted frequently, mailing list data must be reloaded every time. This is a performance hit, but my mailing list are huge but with little traffic (maybe a couple of mails per week), so this is a non issue for me.
I would suggest to separate the subscriber info from the rest of the mailing metadata, since most workers doesn't need the subscriber data in RAM to do its work. So, instead of 6 processes eating RAM, only of them (the outgoing worker) will use significant memory. In fact, mailing list subscribers could be splitted in several files, to avoid to load the entire membership at once. Let say, use 256 files and putting each subscriber in a file according to the last significant byte of its MD5 hash, for instance.
Studying the code, it seems easy to migrate membership to a separate persistence system (let say, ZODB, Durus) or use a backend like sqlite. Any plan for that?. Any interest in patches?.
Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iQCVAwUBSUH+RJlgi5GaxT1NAQJq7AQAm5tbsJQL2zqLFJlHLvha9RUnguzEYKRW tS2LkHkZbmcFFXrYLswfl9Qn20x9FPA9iWN/j9hwh8YK3j7o0sdwS2Yll/44A8NX 4OtfYeOto4aIbYd8VWYa5RPe7ebSYwypkEvbH/FJRt8nDIEvLkr0t9iB7tQ42MsN z+ssg6D6DF4= =yOKL -----END PGP SIGNATURE-----