So, in 1.0rc2, displaying the list of lists for 529 lists requires 529**2 = 279841 system stat calls and takes over one and a half minutes on our Ultra-2 2x296 processor system! Is this because of Python, Mailman, or both? Has this been "fixed" in 2.0? You really should only need to make one stat call per list.
-- Roberto Ullfig : rullfig@uchicago.edu Systems Administrator Networking Services and Information Technologies University of Chicago
Roberto Ullfig wrote:
So, in 1.0rc2, displaying the list of lists for 529 lists requires 529**2 = 279841 system stat calls and takes over one and a half minutes on our Ultra-2 2x296 processor system! Is this because of Python, Mailman, or both? Has this been "fixed" in 2.0? You really should only need to make one stat call per list.
Whatever the answer is, we'd like to be able to generate the lists of lists once a day; I tried running: chroot /opt/http /opt/bin/python /opt/pkgs/mailman/scripts/driver listinfo but get this traceback: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [----- Mailman Version: <undetermined> -----] [----- Traceback ------] Traceback (innermost last): File "/opt/pkgs/mailman/scripts/driver", line 135, in print_traceback from Mailman.mm_cfg import VERSION ImportError: No module named Mailman.mm_cfg Content-type: text/html <head><title>Bug in Mailman version <undetermined></title></head> <body><h2>Bug in Mailman version <undetermined></h2> <p><h3>We're sorry, we hit a bug!</h3> Looks like a I'm missing something fundamental here. I assume that the output from this command would be html commands that could be redirected to a file. Thank you! -- Roberto Ullfig : rullfig@uchicago.edu Systems Administrator Networking Services and Information Technologies University of Chicago
Roberto Ullfig wrote:
ImportError: No module named Mailman.mm_cfg Content-type: text/html
<head><title>Bug in Mailman version <undetermined></title></head> <body><h2>Bug in Mailman version <undetermined></h2> <p><h3>We're sorry, we hit a bug!</h3>
Looks like a I'm missing something fundamental here.
Python's library-search path has to be set to include /home/mailman/Mailman. I just discovered that bin/withlist doesn't do this recently; I'm guessing the driver script doesn't do it either. For your purposes, I would think that augmenting PYTHONPATH in the environment should suffice.
Adding "PYTHONPATH=$PYTHONPATH:/home/mailman/Mailman" makes it work for me. BTW, the chroot seems unnecessary (maybe you knew that, and were just trying for security or something).
Dan Mick wrote:
Roberto Ullfig wrote:
ImportError: No module named Mailman.mm_cfg Content-type: text/html
<head><title>Bug in Mailman version <undetermined></title></head> <body><h2>Bug in Mailman version <undetermined></h2> <p><h3>We're sorry, we hit a bug!</h3>
Looks like a I'm missing something fundamental here.
Python's library-search path has to be set to include /home/mailman/Mailman. I just discovered that bin/withlist doesn't do this recently; I'm guessing the driver script doesn't do it either. For your purposes, I would think that augmenting PYTHONPATH in the environment should suffice.
Adding "PYTHONPATH=$PYTHONPATH:/home/mailman/Mailman" makes it work for me. BTW, the chroot seems unnecessary (maybe you knew that, and were just trying for security or something).
BTW, "python -i /home/mailman/Mailman/Cgi/listinfo.py" gives the same output, although again it needs the PYTHONPATH setting.
"DM" == Dan Mick dan.mick@West.Sun.COM writes:
DM> Python's library-search path has to be set to include
DM> /home/mailman/Mailman. I just discovered that bin/withlist
DM> doesn't do this recently; I'm guessing the driver script
DM> doesn't do it either. For your purposes, I would think that
DM> augmenting PYTHONPATH in the environment should suffice.
Right, because driver is only supposed to be run from the C wrapper binary, which /does/ set PYTHONPATH. I didn't expect withlist to be run without explicitly invoking python. The -r switch could be used this way so I think I'll add a #! line at the top. Note that it does not help to add -i in the #! line.
-Barry
"RU" == Roberto Ullfig
writes:
| Whatever the answer is, we'd like to be able to generate the | lists of lists once a day; I tried running: | chroot /opt/http /opt/bin/python | /opt/pkgs/mailman/scripts/driver listinfo RU> but get this traceback: | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [----- Mailman Version: <undetermined> -----] | [----- Traceback ------] | Traceback (innermost last): | File "/opt/pkgs/mailman/scripts/driver", line 135, in | print_traceback | from Mailman.mm_cfg import VERSION | ImportError: No module named Mailman.mm_cfg | Content-type: text/html Looks like you're trying to run this out of the source directory not the installed directory. I don't think it makes much sense trying to run the CGI's via the driver outside a web server environment -- there are all sorts of environment things missing. Much better to write a script to do what you want. See my recently posted list_lists for an example (or anything inside bin/). -Barry
"RU" == Roberto Ullfig rullfig@uchicago.edu writes:
RU> So, in 1.0rc2, displaying the list of lists for 529 lists
RU> requires 529**2 = 279841 system stat calls and takes over one
RU> and a half minutes on our Ultra-2 2x296 processor system! Is
RU> this because of Python, Mailman, or both? Has this been
RU> "fixed" in 2.0? You really should only need to make one stat
RU> call per list.
Uh, it's because of Mailman :)
I implemented a list_lists scripts which does on the command line what listinfo.py does in HTML (see attached). Here's what truss -c gives me:
-------------------- snip snip -------------------- Portal - [no description available] Postal - [no description available] Stage - Staging new Mailman releases Test - [no description available] syscall seconds calls errors _exit .00 1 read .00 102 write .00 8 open .11 607 474 close .01 143 time .00 3 brk .03 227 stat .03 201 157 getpid .00 10 fstat .00 66 ioctl .02 63 61 execve .00 10 8 umask .00 2 fcntl .00 7 readlink .00 2 2 sigprocmask .00 2 sigaction .00 50 sigpending .00 1 mmap .00 42 mprotect .00 10 munmap .00 11 uname .00 4 sysconfig .00 1 lwp_create .00 6 lwp_continue .00 2 lwp_self .00 3 llseek .00 114 door .00 5 lwp_schedctl .01 5 getdents64 .01 15 fstat64 .00 67 open64 .00 7 ---- --- --- sys totals: .22 1797 702 usr time: .51 elapsed: 1.19 -------------------- snip snip --------------------
Getting the list of list names, requires at least a listdir() and an exists() for every directory found there.
Nothing about this will change for 2.0.
-Barry
-------------------- snip snip -------------------- #! /usr/bin/env python # # Copyright (C) 1998,1999,2000 by the Free Software Foundation, Inc. # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
"""List all mailing lists.
Usage: %(program)s [options]
Where:
--advertised
-a
List only those mailing lists that are publically advertised
--virtual-host-overview=domain
-V domain
List only those mailing lists that are homed to the given virtual
domain. This only works if the VIRTUAL_HOST_OVERVIEW variable is
set.
--help
-h
Print this text and exit.
"""
import sys import getopt import paths
from Mailman import mm_cfg from Mailman import MailList from Mailman import Utils from Mailman import Errors
program = sys.argv[0]
def usage(status, msg=''): print __doc__ % globals() if msg: print msg sys.exit(status)
def main(): try: opts, args = getopt.getopt(sys.argv[1:], 'aV:h', ['advertised', 'virtual-host-overview=', 'help']) except getopt.error, msg: usage(1, msg)
advertised = 0
vhost = None
for opt, arg in opts:
if opt in ('-h', '--help'):
usage(0)
elif opt in ('-a', '--advertised'):
advertised = 1
elif opt in ('-V', '--virtual-host-overview'):
vhost = arg
names = Utils.list_names()
names.sort()
mlists = []
longest = 0
for n in names:
mlist = MailList.MailList(n, lock=0)
if advertised and not mlist.advertised:
continue
if vhost and mm_cfg.VIRTUAL_HOST_OVERVIEW and \
string.find(vhost, mlist.web_page_url) == -1 and \
string.find(mlist.web_page_url, vhost) == -1:
continue
mlists.append(mlist)
longest = max(len(mlist.real_name), longest)
if not mlists:
print 'No matching mailing lists found'
return
format = '%%%ds - %%.%ds' % (longest, 77 - longest)
for mlist in mlists:
description = mlist.description or '[no description available]'
print format % (mlist.real_name, description)
if __name__ == '__main__': main()
"Barry A. Warsaw" wrote:
"RU" == Roberto Ullfig rullfig@uchicago.edu writes:
RU> So, in 1.0rc2, displaying the list of lists for 529 lists RU> requires 529**2 = 279841 system stat calls and takes over one RU> and a half minutes on our Ultra-2 2x296 processor system! Is RU> this because of Python, Mailman, or both? Has this been RU> "fixed" in 2.0? You really should only need to make one stat RU> call per list.
Uh, it's because of Mailman :)
I implemented a list_lists scripts which does on the command line what listinfo.py does in HTML (see attached). Here's what truss -c gives me:
-------------------- snip snip -------------------- Portal - [no description available] Postal - [no description available] Stage - Staging new Mailman releases Test - [no description available] syscall seconds calls errors _exit .00 1 read .00 102 write .00 8 open .11 607 474 close .01 143 time .00 3 brk .03 227 stat .03 201 157 getpid .00 10 fstat .00 66 ioctl .02 63 61 execve .00 10 8 umask .00 2 fcntl .00 7 readlink .00 2 2 sigprocmask .00 2 sigaction .00 50 sigpending .00 1 mmap .00 42 mprotect .00 10 munmap .00 11 uname .00 4 sysconfig .00 1 lwp_create .00 6 lwp_continue .00 2 lwp_self .00 3 llseek .00 114 door .00 5 lwp_schedctl .01 5 getdents64 .01 15 fstat64 .00 67 open64 .00 7 ---- --- --- sys totals: .22 1797 702 usr time: .51 elapsed: 1.19 -------------------- snip snip --------------------
Getting the list of list names, requires at least a listdir() and an exists() for every directory found there.
Nothing about this will change for 2.0.
-Barry
Thanks for the script.
Now this is the truss output for the listinfo that is called by driver:
syscall seconds calls errors _exit .00 1 read .21 1979 write .15 1638 open .12 1233 579 close .06 1189 time .00 1 brk .43 5026 stat 25.58 285877 174 fstat .00 63 ioctl .01 591 589 execve .00 1 umask .00 2 fcntl .02 535 readlink .00 3 2 sigaction .00 48 mmap .00 32 munmap .00 8 llseek .05 643 getdents64 .85 10165 fstat64 .01 1123 open64 .03 535 ---- --- --- sys totals: 27.52 310693 1344 usr time: 54.01 elapsed: 173.76
I can understand a listdir and an exists for each directory; that should come out to 2 * n stat calls right (~1000 for us). What I'm saying is that we are seeing n ** 2 stat calls (that's squared) or 285877 of them. The above truss is from running the driver manually after setting PYTHONPATH as suggested by Dan (thanks Dan):
setenv PYTHONPATH /opt/http/opt/pkgs/mailman/Mailman python /opt/http//opt/pkgs/mailman/scripts/driver listinfo
I've also trussed the running process and gotten similar results; I see it stat'ing every directory once for every directory stat'ed or n-squared stats.
Note that this is with 1.0rc2; still waiting for 2.0.
-- Roberto Ullfig : rullfig@uchicago.edu Systems Administrator Networking Services and Information Technologies University of Chicago
Roberto Ullfig wrote:
"Barry A. Warsaw" wrote:
> "RU" == Roberto Ullfig rullfig@uchicago.edu writes:
RU> So, in 1.0rc2, displaying the list of lists for 529 lists RU> requires 529**2 = 279841 system stat calls and takes over one RU> and a half minutes on our Ultra-2 2x296 processor system! Is RU> this because of Python, Mailman, or both? Has this been RU> "fixed" in 2.0? You really should only need to make one stat RU> call per list.
Uh, it's because of Mailman :)
I implemented a list_lists scripts which does on the command line what listinfo.py does in HTML (see attached). Here's what truss -c gives me:
-------------------- snip snip -------------------- Portal - [no description available] Postal - [no description available] Stage - Staging new Mailman releases Test - [no description available] syscall seconds calls errors _exit .00 1 read .00 102 write .00 8 open .11 607 474 close .01 143 time .00 3 brk .03 227 stat .03 201 157 getpid .00 10 fstat .00 66 ioctl .02 63 61 execve .00 10 8 umask .00 2 fcntl .00 7 readlink .00 2 2 sigprocmask .00 2 sigaction .00 50 sigpending .00 1 mmap .00 42 mprotect .00 10 munmap .00 11 uname .00 4 sysconfig .00 1 lwp_create .00 6 lwp_continue .00 2 lwp_self .00 3 llseek .00 114 door .00 5 lwp_schedctl .01 5 getdents64 .01 15 fstat64 .00 67 open64 .00 7 ---- --- --- sys totals: .22 1797 702 usr time: .51 elapsed: 1.19 -------------------- snip snip --------------------
Getting the list of list names, requires at least a listdir() and an exists() for every directory found there.
Nothing about this will change for 2.0.
-Barry
Thanks for the script.
Now this is the truss output for the listinfo that is called by driver:
syscall seconds calls errors _exit .00 1 read .21 1979 write .15 1638 open .12 1233 579 close .06 1189 time .00 1 brk .43 5026 stat 25.58 285877 174 fstat .00 63 ioctl .01 591 589 execve .00 1 umask .00 2 fcntl .02 535 readlink .00 3 2 sigaction .00 48 mmap .00 32 munmap .00 8 llseek .05 643 getdents64 .85 10165 fstat64 .01 1123 open64 .03 535 ---- --- --- sys totals: 27.52 310693 1344 usr time: 54.01 elapsed: 173.76
And this is the truss from using the script you sent:
syscall seconds calls errors _exit .00 1 read .29 1963 write .03 534 open .13 1132 485 close .00 1182 time .00 1 brk .41 4944 stat 28.32 285849 148 fstat .00 61 ioctl .02 586 584 execve .01 3 1 fcntl .00 535 readlink .00 3 2 sigaction .01 48 mmap .00 40 munmap .01 10 llseek .05 632 getdents64 .92 10165 fstat64 .02 1118 open64 .06 535 ---- --- --- sys totals: 30.28 309342 1220 usr time: 56.08 elapsed: 182.84
All those stat calls just don't seem right to me. Using python 1.5.2 if that matters.
-- Roberto Ullfig : rullfig@uchicago.edu Systems Administrator Networking Services and Information Technologies University of Chicago
In message 38E9F8D2.10F251D5@uchicago.edu, Roberto Ullfig writes:
Roberto Ullfig wrote:
"Barry A. Warsaw" wrote:
>> "RU" == Roberto Ullfig rullfig@uchicago.edu writes:
RU> So, in 1.0rc2, displaying the list of lists for 529 lists RU> requires 529**2 = 279841 system stat calls and takes over one RU> and a half minutes on our Ultra-2 2x296 processor system! Is RU> this because of Python, Mailman, or both? Has this been RU> "fixed" in 2.0? You really should only need to make one stat RU> call per list.
Uh, it's because of Mailman :)
I implemented a list_lists scripts which does on the command line what listinfo.py does in HTML (see attached). Here's what truss -c gives me:
<Snip big truss output>
Getting the list of list names, requires at least a listdir() and an exists() for every directory found there.
Nothing about this will change for 2.0.
-Barry
Thanks for the script.
Now this is the truss output for the listinfo that is called by driver:
<Snip more truss output>
Here's what is happening. When listinfo runs and has to get the list of advertised addresses, it starts by getting a list of mailing lists on the server using Utils.list_names(). Utils.list_names() requires two stat calls for each list on the machine every time it is called. This is understandable and isn't going to change. It then proceeds to open every one of those lists to check on the advertised flag. Again, no problem. The problem is that when Mailman opens a List in the MailList constructor __init__, it checks to makes sure that the list exists by running Utils.list_names() and seeing if the list name requested is there. Therefore every request to open a list requires two stat calls on every list on the system. Therefore when we are sequentially opening every list on the system in listinfo.py we get a squaring effect on stat calls in the list directory.
Solutions are reasonably easy to code, the first of which comes to mind is a optional argument to the constructor that indicated that the name has already been checked and that checking it again is not necessary. Other solutions include caching the list of lists on the server, but this means there is a delay between when the list is created and when it becomes accessible.
I can code something up for you if necessary, but it seems like a reasonably simple patch. Do either you or Barry need a patch? Let me know if you do.
-- Ted Cabeen http://www.pobox.com/~secabeen secabeen@pobox.com Check Website or finger for PGP Public Key secabeen@midway.uchicago.edu "I have taken all knowledge to be my province." -F. Bacon cococabeen@aol.com "Human kind cannot bear very much reality."-T.S.Eliot 73126.626@compuserve.com
"TC" == Ted Cabeen secabeen@pobox.com writes:
TC> Here's what is happening.
Doh! Thanks for finding this Ted. To be honest, I've looked at that constructor hundreds of times and my eyes just parsed right over it.
I think the right thing to do may be to just let the MMBadListError percolate up through the Load() call and just zap the test for MMUnknownListError. I'll have to grep through the code though to see if that would cause other problems. I suspect not, because I remember basically having to catch both errors all the time anyway.
Again, thanks for the sleuthing! -Barry
Ted's analysis was right on-target, however the right fix inspired me to do more hacking than expected. The good news is that for the 64 lists on python.org, the number of stat calls bin/list_lists does has just dropped from 4453 to 357 (with the same number of errors in both: 214). This should be a big win for your huge lists.
I don't have a patch handy, but you can either check out the CVS snapshot or wait for beta2. No ETA there, but RRRSN :)
-Barry
"Barry A. Warsaw" wrote:
Ted's analysis was right on-target, however the right fix inspired me to do more hacking than expected. The good news is that for the 64 lists on python.org, the number of stat calls bin/list_lists does has just dropped from 4453 to 357 (with the same number of errors in both: 214). This should be a big win for your huge lists.
I don't have a patch handy, but you can either check out the CVS snapshot or wait for beta2. No ETA there, but RRRSN :)
Thanks, will probably wait until after we install 2 to patch anything (if it isn't in 2). Note, that this affects the Admin list page too.
We've got an hourly cronjob creating the file; the only problem is getting the listinfo URL to display that html instead of running listinfo. For now we just have an alternate URL for the pre-generated list of lists.
-- Roberto Ullfig : rullfig@uchicago.edu Systems Administrator Networking Services and Information Technologies University of Chicago
"RU" == Roberto Ullfig rullfig@uchicago.edu writes:
RU> All those stat calls just don't seem right to me. Using python
RU> 1.5.2 if that matters.
Shouldn't. I'm seeing approximately n**2 stat calls too. It'll take some investigating to figure out what's going on, and I'm not sure it'll happen in time for 2.0.
-Barry
participants (4)
-
Barry A. Warsaw
-
Dan Mick
-
Roberto Ullfig
-
Ted Cabeen