-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I experienced huge mailman queue processes, with ram usage of 700MB and more for each one of the six queue workers. I have several big mailing lists. One of about 180.000 subscriber, and other of about 70.000.
Debugging the issue I found:
Each queue worker touching a message will load the entire mailing list database in RAM. So, the RAM used by each worker is the sum of all mailing lists in the system (if all of them have traffic). This is a big issue if you have huge mailing lists.
The list data is keep in memory using a cache managed via weak references. But the cache is never evicted, so there is a hard reference out there, somewhere.
I found a memory reference cycle between a Mailing list and its OldStyleMemberships component, linked via "self._memberadaptor". This cycle keeps the mailing list alive and, so, the cache never evicted the data.
I changed the OldStyleMemberships constructor to:
""" class OldStyleMemberships(MemberAdaptor.MemberAdaptor): def __init__(self, mlist): import weakref self.__mlist = weakref.proxy(mlist) """
to keep only a weak reference to the mailing list, breaking the cycle.
Now, when a worker is done with a mailing list, the cache is correctly evicted.
Since python doesn't give back memory to system, the consequence of this change is:
Now, memory used by each worker is proportional to the size of the biggest mailing list, instead of the sum of all mailing list sizes. Not perfect, but a huge improvement is you have some big lists.
Now, since cache in evicted frequently, mailing list data must be reloaded every time. This is a performance hit, but my mailing list are huge but with little traffic (maybe a couple of mails per week), so this is a non issue for me.
I would suggest to separate the subscriber info from the rest of the mailing metadata, since most workers doesn't need the subscriber data in RAM to do its work. So, instead of 6 processes eating RAM, only of them (the outgoing worker) will use significant memory. In fact, mailing list subscribers could be splitted in several files, to avoid to load the entire membership at once. Let say, use 256 files and putting each subscriber in a file according to the last significant byte of its MD5 hash, for instance.
Studying the code, it seems easy to migrate membership to a separate persistence system (let say, ZODB, Durus) or use a backend like sqlite. Any plan for that?. Any interest in patches?.
Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iQCVAwUBSUH+RJlgi5GaxT1NAQJq7AQAm5tbsJQL2zqLFJlHLvha9RUnguzEYKRW tS2LkHkZbmcFFXrYLswfl9Qn20x9FPA9iWN/j9hwh8YK3j7o0sdwS2Yll/44A8NX 4OtfYeOto4aIbYd8VWYa5RPe7ebSYwypkEvbH/FJRt8nDIEvLkr0t9iB7tQ42MsN z+ssg6D6DF4= =yOKL -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Dec 12, 2008, at 1:01 AM, Jesus Cea wrote:
Studying the code, it seems easy to migrate membership to a separate persistence system (let say, ZODB, Durus) or use a backend like
sqlite. Any plan for that?. Any interest in patches?.
Yes, but not in Mailman 2. It's in Mailman 3 by default and any code
that helps that branch get further along will be greatly appreciated.
FWIW, I am planning on another alpha release before the end of the
year. My intent is to have the system working and usable without a
web ui, but possibly with the administrative REST interface we'd been
talking about.
You're analysis is essentially correct. For Mailman 2.2, I think
adding the weakref would be fine in principle, but the more invasive
data store changes would not be. Much better to get Mailman 3 out the
door with its real database backend.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin)
iEYEARECAAYFAklCjAMACgkQ2YZpQepbvXGELgCeLRYuuovefsgt5WgAVpRZh3R7 sRQAniPRBQ9vvQ/Wgng6lbHMVYZW04NY =Y2bn -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jesus Cea wrote:
- I found a memory reference cycle between a Mailing list and its OldStyleMemberships component, linked via "self._memberadaptor". This cycle keeps the mailing list alive and, so, the cache never evicted the data.
I changed the OldStyleMemberships constructor to:
""" class OldStyleMemberships(MemberAdaptor.MemberAdaptor): def __init__(self, mlist): import weakref self.__mlist = weakref.proxy(mlist) """
to keep only a weak reference to the mailing list, breaking the cycle.
Thanks very much for your efforts in debugging this.
- Now, since cache in evicted frequently, mailing list data must be reloaded every time. This is a performance hit, but my mailing list are huge but with little traffic (maybe a couple of mails per week), so this is a non issue for me.
The use of the cache has been changed for 2.2. See the full thread at <http://mail.python.org/pipermail/mailman-developers/2008-August/020329.html> for more information. In 2.2, the cache will be less effective anyway, and the impact doesn't seem too severe.
I am going to implement your change to OldStyleMemberships for 2.2. I'm almost inclined to drop the cache all together as I think with the 2.2 logic, hits may be rare. In theory, the logic can avoid a second read of the pickle if the runner first instantiates the list unlocked and subsequently locks it, but I suspect this normally happens in the same clock second so the second read wouldn't be avoided anyway.
Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32)
iD8DBQFJQswbVVuXXpU7hpMRApSoAKDlxigg49X9N+JiQN2QFwjQvySDzACgrUcZ JG6h+E9bm29rY/GbriGbSpw= =rZiy -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Mark Sapiro wrote:
Jesus Cea wrote:
I changed the OldStyleMemberships constructor to:
""" class OldStyleMemberships(MemberAdaptor.MemberAdaptor): def __init__(self, mlist): import weakref self.__mlist = weakref.proxy(mlist) """
to keep only a weak reference to the mailing list, breaking the cycle.
Thanks very much for your efforts in debugging this.
- Now, since cache in evicted frequently, mailing list data must be reloaded every time. This is a performance hit, but my mailing list are huge but with little traffic (maybe a couple of mails per week), so this is a non issue for me.
The use of the cache has been changed for 2.2. See the full thread at <http://mail.python.org/pipermail/mailman-developers/2008-August/020329.html> for more information. In 2.2, the cache will be less effective anyway, and the impact doesn't seem too severe.
I am going to implement your change to OldStyleMemberships for 2.2. I'm almost inclined to drop the cache all together as I think with the 2.2 logic, hits may be rare. In theory, the logic can avoid a second read of the pickle if the runner first instantiates the list unlocked and subsequently locks it, but I suspect this normally happens in the same clock second so the second read wouldn't be avoided anyway.
There is a problem with the suggested change. If we are running under Python 2.6, the creation of the proxy
self.__mlist = weakref.proxy(mlist)
Produces two of the following messages
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <type 'exceptions.AttributeError'> ignored
These do not occur with Python 2.5.1. Presumably these are due to something within the structure of the list object itself and possibly render some parts of the list object unavailable to OldStyleMemberships via the proxy object. So far, I haven't identified an operational problem due to this, but because of this and the other considerations mentioned above, I'm now inclined to just abandon the list cache in Runner.py
Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32)
iD8DBQFJREeQVVuXXpU7hpMRAsXyAJ9mRNP2jArqQoHLzX4DUoDkBeNXzQCcDruz FsRzPpaYptdRVcfZ5VXBy5Q= =sDrD -----END PGP SIGNATURE-----
participants (3)
-
Barry Warsaw
-
Jesus Cea
-
Mark Sapiro