a patch to scale Cgi/admin.py

I'm knee-deep in Mailman/Gui/admin.py and it really doesn't scale. I use a test-list of 300k addresses, and it's a bit more than 5 minutes to get it to answer (if the connection holds that long, of course). It's particularly true when using the MySQLMemberAdaptor, where many things are not taken from memory but are reprocessed with MySQL queries. For instance, the part that checks if the member is regular/digest fetches fetches all the data for each subscriber (more plainly said, it's in N^2). Another bottleneck is the list of chunks that is computed and displayed (and sent to the client) - his list is quite long to compute, and as a user it's not that useful in general. Last but not least, the search facility calls the mysql-db for each member, in order to extract her name and regexp it; and that's very long. Is wasn't able to find how to speed this up, and just disabled it in my system (but not in the patch provided below) So here are a few small changes, that make a radical improvement (down to 45 seconds from 4 minutes): --- /home/fil/src_mailman/mailman/Mailman/Cgi/admin.py 2005-02-12 21:22:55.000000000 +0100 +++ Mailman/Cgi/admin.py 2005-10-29 16:43:56.116988176 +0200 @@ -876,6 +876,7 @@ def membership_options(mlist, subcat, cg doc.addError(_('Bad regular expression: ') + regexp) else: # BAW: There's got to be a more efficient way of doing this! + # yes please... this doesn't scale at all names = [mlist.getMemberName(s) or '' for s in all] all = [a for n, a in zip(names, all) if cre.search(n) or cre.search(a)] @@ -978,6 +979,8 @@ def membership_options(mlist, subcat, cg MemberAdaptor.BYADMIN : _('A'), MemberAdaptor.BYBOUNCE: _('B'), } + # memorize the regular-or-digest list + regular_or_digest = mlist.getRegularMemberKeys() # Now populate the rows for addr in members: link = Link(mlist.GetOptionsURL(addr, obscure=1), @@ -1021,8 +1024,8 @@ def membership_options(mlist, subcat, cg # This code is less efficient than the original which did a has_key on # the underlying dictionary attribute. This version is slower and # less memory efficient. It points to a new MemberAdaptor interface - # method. - if addr in mlist.getRegularMemberKeys(): + # method. (Modified by Fil to "cache" the result - useful for MySQLMemberAdaptor) + if addr in regular_or_digest: cells.append(Center(CheckBox(addr + '_digest', 'off', 0).Format())) else: cells.append(Center(CheckBox(addr + '_digest', 'on', 1).Format())) @@ -1113,7 +1116,7 @@ def membership_options(mlist, subcat, cg range listed below:</em>''') chunkmembers = buckets[bucket] last = len(chunkmembers) - for i in range(numchunks): + for i in range(min(10,numchunks)): if i == chunkindex: continue start = chunkmembers[i*chunksz] -- Fil

What it sounds like you really want in order to minimize database I/O is to implement an in-memory caching system on top of the various methods of the MemberAdaptor. So you'd have per MySQLMemberAdaptor object a dictionary keyed the same as the database table with dictionaries for the various fields per subscriber. If there is a KeyError when trying to access the dictionary, hit the database. If the database returns no rows, then you raise NotAMemberError or return whatever may be appropriate. True, this would only be effective per connection or per post, but it seems to be the most efficient means of maximizing scalability. YMMV. -jag On Sat, 2005-10-29 at 16:49 +0200, Fil wrote:
-- Joshua Ginsberg <jag@fsf.org> Free Software Foundation - Senior Systems Administrator

I'd love to see it happen, but you also have to be careful (speaking of scaling, not for my own uses which are served at < 300 k lists) of not reaching a memory limit. If your lists have 10 million subscribers (say), you don't want to load the whole list in memory just to retreive one address. This is the DB's job (ie MySQL itself, or OldStyleMemberAdaptor.py when using the usual db), not Mailman's job per se.
In parallel I have another idea that could be somehow faster for the members page: instead of splitting the list into "buckets", and restricting the display to a chunk of a bucket, just get rid of buckets, and chunk wherever in the list. And, in order to get the functionality of "buckets" back, just add the initial letters in the list of links to the different "chunks".
The links to a specific chunk would be styled as /members/?start=jane@doe.com and the display test would be if ( addr >= start ) { prepare the display }
Not sure if my English makes sense, I'll just post the code when I'm done pythonizing the idea. I don't even know how to compare two strings in python, so it might take a little while :-D
Note that this would lose no functionality: it may even be a bit more useful as UI for medium-sized lists of ~100 subscribers -- currently if your list holds 2 addresses starting by "a", two by "b" and so on, you have to check 26 pages of subscribers, whereas you really need just two (the first 40, and the last 12).
-- Fil

Okay, now it's done -- it's just a functionality rewrite, nothing is lost except a few lines of code :)
Enclosed is the patch + the patched file. (I'm not fully in sync with the CVS as I got an error upgrading from 2.1.6b to 2.1-Maint)
If you want to try it it's simple and can't do much harm, as it's only affecting the Web GUI - you don't have to restart Mailman, just save Mailman/Cgi/admin.py aside (in case), and replace it with this one.
Note that I also removed the annoying "language" menu when there's only one language available.
(BTW Something I'd like to add is a 'title="jane@doe.com"' attribute in the <a href> element, but I couldn't find how to do it.)
* * *
We'll still need to solve the "search" issue, but that will require much more work, I think, as the best way to do it will be to implement a new method in the memberadaptor; and that will need discussion, as there are two options:
- add a "getMembersMatching(regexp) method (best, I think, as it can leverage foreign search methods, i.e. MySQL's "SELECT WHERE name LIKE %s")
- add a "getMembersWithNames()" method (not so good, but for the sake oif the discussion I include the idea here) Please tell me which route to take, or I'll take Route 1.
-- Fil

I have implemented this for the MySQLMemberAdaptor, and it's a fabulous speed improvement. My question to Barry would be now: do I need to make this a compulsory method for the MemberAdaptor class (and declare this function in MemberAdaptor.py), like:
def getMembersMatching(self, regexp):
"""Get all the members who match regexp"""
raise NotImplementedError
or is it enough to just "fall back" to the previous algorithm in case this method doesn't exist (and then I need patch only admin.py and MySQLMemberAdaptor, which I have done)?
-- Fil

What it sounds like you really want in order to minimize database I/O is to implement an in-memory caching system on top of the various methods of the MemberAdaptor. So you'd have per MySQLMemberAdaptor object a dictionary keyed the same as the database table with dictionaries for the various fields per subscriber. If there is a KeyError when trying to access the dictionary, hit the database. If the database returns no rows, then you raise NotAMemberError or return whatever may be appropriate. True, this would only be effective per connection or per post, but it seems to be the most efficient means of maximizing scalability. YMMV. -jag On Sat, 2005-10-29 at 16:49 +0200, Fil wrote:
-- Joshua Ginsberg <jag@fsf.org> Free Software Foundation - Senior Systems Administrator

I'd love to see it happen, but you also have to be careful (speaking of scaling, not for my own uses which are served at < 300 k lists) of not reaching a memory limit. If your lists have 10 million subscribers (say), you don't want to load the whole list in memory just to retreive one address. This is the DB's job (ie MySQL itself, or OldStyleMemberAdaptor.py when using the usual db), not Mailman's job per se.
In parallel I have another idea that could be somehow faster for the members page: instead of splitting the list into "buckets", and restricting the display to a chunk of a bucket, just get rid of buckets, and chunk wherever in the list. And, in order to get the functionality of "buckets" back, just add the initial letters in the list of links to the different "chunks".
The links to a specific chunk would be styled as /members/?start=jane@doe.com and the display test would be if ( addr >= start ) { prepare the display }
Not sure if my English makes sense, I'll just post the code when I'm done pythonizing the idea. I don't even know how to compare two strings in python, so it might take a little while :-D
Note that this would lose no functionality: it may even be a bit more useful as UI for medium-sized lists of ~100 subscribers -- currently if your list holds 2 addresses starting by "a", two by "b" and so on, you have to check 26 pages of subscribers, whereas you really need just two (the first 40, and the last 12).
-- Fil

Okay, now it's done -- it's just a functionality rewrite, nothing is lost except a few lines of code :)
Enclosed is the patch + the patched file. (I'm not fully in sync with the CVS as I got an error upgrading from 2.1.6b to 2.1-Maint)
If you want to try it it's simple and can't do much harm, as it's only affecting the Web GUI - you don't have to restart Mailman, just save Mailman/Cgi/admin.py aside (in case), and replace it with this one.
Note that I also removed the annoying "language" menu when there's only one language available.
(BTW Something I'd like to add is a 'title="jane@doe.com"' attribute in the <a href> element, but I couldn't find how to do it.)
* * *
We'll still need to solve the "search" issue, but that will require much more work, I think, as the best way to do it will be to implement a new method in the memberadaptor; and that will need discussion, as there are two options:
- add a "getMembersMatching(regexp) method (best, I think, as it can leverage foreign search methods, i.e. MySQL's "SELECT WHERE name LIKE %s")
- add a "getMembersWithNames()" method (not so good, but for the sake oif the discussion I include the idea here) Please tell me which route to take, or I'll take Route 1.
-- Fil

I have implemented this for the MySQLMemberAdaptor, and it's a fabulous speed improvement. My question to Barry would be now: do I need to make this a compulsory method for the MemberAdaptor class (and declare this function in MemberAdaptor.py), like:
def getMembersMatching(self, regexp):
"""Get all the members who match regexp"""
raise NotImplementedError
or is it enough to just "fall back" to the previous algorithm in case this method doesn't exist (and then I need patch only admin.py and MySQLMemberAdaptor, which I have done)?
-- Fil
participants (2)
-
Fil
-
Joshua Ginsberg