Re: [Mailman-Developers] [Mailman-Users] Subscribers suddenly "disappear"
I tried the "sleep" approach as well. But then I thought that it was not viable. I am setting a relatively high number of mailing list that will receive asynchronous "subscribe" requests via web and/or shell API.
It would be simply not possible to prevent bug this from happening as the chance of two "subscribe" being processsed almost simultaneously are not that small...
I also vote for some kind of timing/lock issue.
Max
Barry Finkel wrote:
Max Lanfranconi <Max.Lanfranconi@Sun.COM> wrote:
Mailman 2.1.11 Python 2.4.4 OS Solaris 2.11
Hi,
I have been able to reproduce this bug consistently by running the replicate_bug script:
replicate _bug is the following:
#!/bin/sh /usr/local/mailman/bin/rmlist testlist1 /usr/local/mailman/bin/rmlist testlist2 /usr/local/mailman/bin/rmlist testlist3 /usr/local/mailman/bin/rmlist testlist4 /usr/local/mailman/bin/rmlist testlist5 /usr/local/mailman/bin/rmlist testlist6 /usr/local/mailman/bin/newlist -q -e url.domain.com testlist1 list-admin@domain.com testpwd /usr/local/mailman/bin/newlist -q -e url.domain.com testlist2 list-admin@domain.com testpwd /usr/local/mailman/bin/newlist -q -e url.domain.com testlist3 list-admin@domain.com testpwd /usr/local/mailman/bin/newlist -q -e url.domain.com testlist4 list-admin@domain.com testpwd /usr/local/mailman/bin/newlist -q -e url.domain.com testlist5 list-admin@domain.com testpwd /usr/local/mailman/bin/newlist -q -e url.domain.com testlist6 list-admin@domain.com testpwd
echo "foo@bar.com" | /usr/local/mailman/bin/add_members -r - testlist1 echo "foo@bar.com" | /usr/local/mailman/bin/add_members -r - testlist2 echo "foo@bar.com" | /usr/local/mailman/bin/add_members -r - testlist3 echo "foo@bar.com" | /usr/local/mailman/bin/add_members -r - testlist4 echo "foo@bar.com" | /usr/local/mailman/bin/add_members -r - testlist5 echo "foo@bar.com" | /usr/local/mailman/bin/add_members -r - testlist6 echo "boo@foo.com" | /usr/local/mailman/bin/add_members -r - testlist1 echo "boo@foo.com" | /usr/local/mailman/bin/add_members -r - testlist2 echo "boo@foo.com" | /usr/local/mailman/bin/add_members -r - testlist3 echo "boo@foo.com" | /usr/local/mailman/bin/add_members -r - testlist4 echo "boo@foo.com" | /usr/local/mailman/bin/add_members -r - testlist5 echo "boo@foo.com" | /usr/local/mailman/bin/add_members -r - testlist6
After a short wait the following output is received:
Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com
foo@bar.com receives 6 confirmation emails, as boo@foo.com does. S o far so good.
At this point testlist1-6 each should contain 2 subscribers: foo@bar.com and boo@foo.com
BUT
/usr/local/mailman/bin/list_members testlist1 /usr/local/mailman/bin/list_members testlist2 /usr/local/mailman/bin/list_members testlist3 /usr/local/mailman/bin/list_members testlist4 /usr/local/mailman/bin/list_members testlist5 /usr/local/mailman/bin/list_members testlist6
invariably produce some random combination in which one or more of the subscribers are missing: for example: boo@foo.com foo@bar.com foo@bar.com boo@foo.com foo@bar.com foo@bar.com foo@bar.com boo@foo.com foo@bar.com
in which three instances of boo@foo.com are missing...
No Errors in any Mailman log.
Thanks in advance for your help. Please let me know if you need additional details. Regards, Max
I ran the script (after some minor modifications) on
Ubuntu Dapper Mailman 2.1.11 (self-built package) Python 2.4.3 (#2, Oct 6 2006, 07:49:22)
and I get similar results:
Script started on Tue 05 Aug 2008 09:45:31 AM CDT # set prompt="mailman11-test# " mailman11-test# ./replicate_bug.exec Remove the components of a mailing list with impunity - beware! This removes (almost) all traces of a mailing list. By default, the lists archives are not removed, which is very handy for retiring old lists. Usage:
rmlist [-a] [-h] listname
Where:
--archives -a Remove the list's archives too, or if the list has already been deleted, remove any residual archives.
--help -h Print this help message and exit.
No such list (or list already deleted): testlist1 Remove the components of a mailing list with impunity - beware! This removes (almost) all traces of a mailing list. By default, the lists archives are not removed, which is very handy for retiring old lists. Usage:
rmlist [-a] [-h] listname
Where:
--archives -a Remove the list's archives too, or if the list has already been deleted, remove any residual archives.
--help -h Print this help message and exit.
No such list (or list already deleted): testlist2 Remove the components of a mailing list with impunity - beware! This removes (almost) all traces of a mailing list. By default, the lists archives are not removed, which is very handy for retiring old lists. Usage:
rmlist [-a] [-h] listname
Where:
--archives -a Remove the list's archives too, or if the list has already been deleted, remove any residual archives.
--help -h Print this help message and exit.
No such list (or list already deleted): testlist3 Remove the components of a mailing list with impunity - beware! This removes (almost) all traces of a mailing list. By default, the lists archives are not removed, which is very handy for retiring old lists. Usage:
rmlist [-a] [-h] listname
Where:
--archives -a Remove the list's archives too, or if the list has already been deleted, remove any residual archives.
--help -h Print this help message and exit.
No such list (or list already deleted): testlist4 Remove the components of a mailing list with impunity - beware! This removes (almost) all traces of a mailing list. By default, the lists archives are not removed, which is very handy for retiring old lists. Usage:
rmlist [-a] [-h] listname
Where:
--archives -a Remove the list's archives too, or if the list has already been deleted, remove any residual archives.
--help -h Print this help message and exit.
No such list (or list already deleted): testlist5 Remove the components of a mailing list with impunity - beware! This removes (almost) all traces of a mailing list. By default, the lists archives are not removed, which is very handy for retiring old lists. Usage:
rmlist [-a] [-h] listname
Where:
--archives -a Remove the list's archives too, or if the list has already been deleted, remove any residual archives.
--help -h Print this help message and exit.
No such list (or list already deleted): testlist6 Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com mailman11-test# foreach list (1 2 3 4 5 6) ? echo $list ? list_members testlist$list ? end 1 boo@foo.com foo@bar.com 2 boo@foo.com foo@bar.com 3 boo@foo.com foo@bar.com 4 boo@foo.com foo@bar.com 5 foo@bar.com 6 foo@bar.com mailman11-test# ======================================================= mailman11-test# ./replicate_bug.exec Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com mailman11-test# foreach list (1 2 3 4 5 6) ? echo $list ? list_members testlist$list ? end 1 boo@foo.com foo@bar.com 2 boo@foo.com foo@bar.com 3 foo@bar.com 4 foo@bar.com 5 foo@bar.com 6 foo@bar.com mailman11-test# ======================================================= mailman11-test# ./replicate_bug.exec Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Not removing archives. Reinvoke with -a to remove them. Removing list info Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: foo@bar.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com Subscribed: boo@foo.com mailman11-test# foreach list (1 2 3 4 5 6) ? echo $list ? list_members testlist$list ? end 1 foo@bar.com 2 boo@foo.com foo@bar.com 3 foo@bar.com 4 foo@bar.com 5 foo@bar.com 6 foo@bar.com mailman11-test# exit Script done on Tue 05 Aug 2008 09:50:48 AM CDT
I then added
sleep 5
after each "add_members" line, and the output looked fine. I changed the sleep interval from 5 down to 1 in successive runs, and each output looks fine; each list has the proper two subscribers. Is there a timing issue here?
Barry S. Finkel Computing and Information Systems Division Argonne National Laboratory Phone: +1 (630) 252-7277 9700 South Cass Avenue Facsimile:+1 (630) 252-4601 Building 222, Room D209 Internet: BSFinkel@anl.gov Argonne, IL 60439-4828 IBMMAIL: I1004994
Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/max.lanfranconi%40sun.c...
Security Policy: http://wiki.list.org/x/QIA9
Max Lanfranconi wrote:
I tried the "sleep" approach as well. But then I thought that it was not viable. I am setting a relatively high number of mailing list that will receive asynchronous "subscribe" requests via web and/or shell API.
It would be simply not possible to prevent bug this from happening as the chance of two "subscribe" being processsed almost simultaneously are not that small...
I also vote for some kind of timing/lock issue.
Based on my running of the test script (see <http://mail.python.org/pipermail/mailman-users/2008-August/062885.html>, with
LIST_LOCK_DEBUGGING = True
I think I see the problem. It is related to qrunner list caching and the fact the there is insufficient precision in the list instance's __timestamp
The scenario is the following
add_member saves the list with the first member.
VirginRunner gets there first, instantiates and caches the list. It then locks the list, processes the welcome and saves and unlocks the list.
add_member gets the lock, adds the second member and saves the list.
Virgin runner gets the second welcome. The list is cached, so it uses the cached instance. It then locks the list which ultimately calls MailList.__load() to refresh the list data, but __load() does
mtime = os.path.getmtime(dbfile) if mtime <= self.__timestamp: # File is not newer return None, None
The problem is os.path.getmtime() returns a time in seconds so if we are still in the same second as step 2), we don't refresh the list.
Thank you very much Max for providing a script to reproduce the problem.
I'm not sure of the best solution. I'm leaning towards dropping the __timestamp test or perhaps replacing it with a file size test, but that too may be unreliable.
Other thoughts?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark,
I am more than willing to try any workaround as this bug is currently a showstopper for my project. I am not very versed in mailman internals, so I need some guidance here.
Do you recommend I try getting rid of the __timestamp test and give it a try ?
If that is you suggestion, may I ask what was the original purpose of that test ?
Thanks again for your help in solving this.
Regards,
Max
Mark Sapiro wrote:
Max Lanfranconi wrote:
I tried the "sleep" approach as well. But then I thought that it was not viable. I am setting a relatively high number of mailing list that will receive asynchronous "subscribe" requests via web and/or shell API.
It would be simply not possible to prevent bug this from happening as the chance of two "subscribe" being processsed almost simultaneously are not that small...
I also vote for some kind of timing/lock issue.
Based on my running of the test script (see <http://mail.python.org/pipermail/mailman-users/2008-August/062885.html>, with
LIST_LOCK_DEBUGGING = True
I think I see the problem. It is related to qrunner list caching and the fact the there is insufficient precision in the list instance's __timestamp
The scenario is the following
add_member saves the list with the first member.
VirginRunner gets there first, instantiates and caches the list. It then locks the list, processes the welcome and saves and unlocks the list.
add_member gets the lock, adds the second member and saves the list.
Virgin runner gets the second welcome. The list is cached, so it uses the cached instance. It then locks the list which ultimately calls MailList.__load() to refresh the list data, but __load() does
mtime = os.path.getmtime(dbfile) if mtime <= self.__timestamp: # File is not newer return None, None
The problem is os.path.getmtime() returns a time in seconds so if we are still in the same second as step 2), we don't refresh the list.
Thank you very much Max for providing a script to reproduce the problem.
I'm not sure of the best solution. I'm leaning towards dropping the __timestamp test or perhaps replacing it with a file size test, but that too may be unreliable.
Other thoughts?
--
<http://jcp.org> * Max Lanfranconi * JCP Program Management Office Marketing Manager
Phone +1 408 404 6893 Mobile +1 408 505 1020 Email max@jcp.org
Massimo Lanfranconi wrote:
I am more than willing to try any workaround as this bug is currently a showstopper for my project. I am not very versed in mailman internals, so I need some guidance here.
Do you recommend I try getting rid of the __timestamp test and give it a try ?
Yes.
If that is you suggestion, may I ask what was the original purpose of that test ?
Lots of Mailman processes will instantiate a list unlocked for some preliminary checks and then later, lock the list to do some update. The purpose of the timestamp test was to avoid rereading the list data from the config.pck file if the already loaded data is still current.
What I suggest for the moment is to just delete lines 594-603 of Mailman/MailList.py which includes the following 4 lines and the preceding comment.
mtime = os.path.getmtime(dbfile) if mtime <= self.__timestamp: # File is not newer return None, None
This will result in some additional I/O in some cases, but it will avoid this problem of sometimes not loading the list data when it has actually changed.
Please do that and restart Mailman and I'm sure the bug will be fixed (but don't take my word for it).
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mark Sapiro wrote: | What I suggest for the moment is to just delete lines 594-603 of | Mailman/MailList.py which includes the following 4 lines and the | preceding comment. | |>> mtime = os.path.getmtime(dbfile) |>> if mtime <= self.__timestamp: |>> # File is not newer |>> return None, None Ooops!. That won't quite do it. you can't just remove the ~ mtime = os.path.getmtime(dbfile) line as mtime is referenced later and will be undefined. A patch is attached which has been applied and tested. I ran the test script twice with this patch installed with no lost subscribers. It isn't the final patch which should remove __timestamp completely and won't remove all the comment, but it will do as a workaround. - -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) iD8DBQFImSKDVVuXXpU7hpMRArrKAJ9K2fA8MzI7Iarp0WfnNClwtwVbdACfaxCO Xsx7ru3fK2/v5ZRw0YHYU9s= =BdZw -----END PGP SIGNATURE----- --- Mailman/MailList.py 2008-06-30 09:29:46.000000000 -0700 +++ Mailman/MailList.py 2008-08-05 20:43:19.000000000 -0700 @@ -591,16 +591,7 @@ else: assert 0, 'Bad database file name' try: - # Check the mod time of the file first. If it matches our - # timestamp, then the state hasn't change since the last time we - # loaded it. Otherwise open the file for loading, below. If the - # file doesn't exist, we'll get an EnvironmentError with errno set - # to ENOENT (EnvironmentError is the base class of IOError and - # OSError). mtime = os.path.getmtime(dbfile) - if mtime <= self.__timestamp: - # File is not newer - return None, None fp = open(dbfile) except EnvironmentError, e: if e.errno <> errno.ENOENT: raise
Woohooo!
Applied your patch and the bug is gone !
So far, so good.
Thanks Mark !
Much, much appreciated.
Max
Mark Sapiro wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Mark Sapiro wrote:
| What I suggest for the moment is to just delete lines 594-603 of | Mailman/MailList.py which includes the following 4 lines and the | preceding comment. | |>> mtime = os.path.getmtime(dbfile) |>> if mtime <= self.__timestamp: |>> # File is not newer |>> return None, None
Ooops!. That won't quite do it. you can't just remove the
~ mtime = os.path.getmtime(dbfile)
line as mtime is referenced later and will be undefined. A patch is attached which has been applied and tested. I ran the test script twice with this patch installed with no lost subscribers. It isn't the final patch which should remove __timestamp completely and won't remove all the comment, but it will do as a workaround.
Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32)
iD8DBQFImSKDVVuXXpU7hpMRArrKAJ9K2fA8MzI7Iarp0WfnNClwtwVbdACfaxCO Xsx7ru3fK2/v5ZRw0YHYU9s= =BdZw -----END PGP SIGNATURE-----
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/max.lanfranconi%40...
Security Policy: http://wiki.list.org/x/QIA9
Mark Sapiro writes:
add_member saves the list with the first member.
VirginRunner gets there first, instantiates and caches the list. It then locks the list, processes the welcome and saves and unlocks the list.
add_member gets the lock, adds the second member and saves the list.
Virgin runner gets the second welcome. The list is cached, so it uses the cached instance. It then locks the list which ultimately calls MailList.__load() to refresh the list data, but __load() does
mtime = os.path.getmtime(dbfile) if mtime <= self.__timestamp: # File is not newer return None, None
Shouldn't "mtime < self.__timestamp" do the right thing (much more often)? You're still vulnerable to "date -s", adjtime, and friends, though, and of course you'll have some undesirable cache misses at times when it would be nice if you didn't.
A better way would be to add a serial number.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Aug 5, 2008, at 9:08 PM, Stephen J. Turnbull wrote:
Mark Sapiro writes:
add_member saves the list with the first member.
VirginRunner gets there first, instantiates and caches the list. It then locks the list, processes the welcome and saves and unlocks the list.
add_member gets the lock, adds the second member and saves the
list.Virgin runner gets the second welcome. The list is cached, so it uses the cached instance. It then locks the list which ultimately calls MailList.__load() to refresh the list data, but __load() does
mtime = os.path.getmtime(dbfile) if mtime <= self.__timestamp: # File is not newer return None, None
Shouldn't "mtime < self.__timestamp" do the right thing (much more often)? You're still vulnerable to "date -s", adjtime, and friends, though, and of course you'll have some undesirable cache misses at times when it would be nice if you didn't.
Probably so, but is the optimization still worth it?
A better way would be to add a serial number.
You'd probably want to store the serial number in the file name so you
wouldn't have to load the pickle just to get the current serial. It's
similar to what happens with the queue files. But again, I'm not sure
it's worth it.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin)
iEYEARECAAYFAkiY+ygACgkQ2YZpQepbvXHdBwCfZRO73w4KiA0FMg6eU3yDU8YN Y7AAoLE5760wZWw536oMj1zyHMNi8h4a =w1PV -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Stephen J. Turnbull wrote:
Mark Sapiro writes:
add_member saves the list with the first member.
VirginRunner gets there first, instantiates and caches the list. It then locks the list, processes the welcome and saves and unlocks the list.
add_member gets the lock, adds the second member and saves the list.
Virgin runner gets the second welcome. The list is cached, so it uses the cached instance. It then locks the list which ultimately calls MailList.__load() to refresh the list data, but __load() does
mtime = os.path.getmtime(dbfile) if mtime <= self.__timestamp: # File is not newer return None, None
Shouldn't "mtime < self.__timestamp" do the right thing (much more often)?
I didn't reply sooner because when I first saw this, I didn't read it carefully, and I didn't "get" what Stephen was saying. Then I had to think about it.
I have concluded that barring resetting of clocks backward, "mtime < self.__timestamp" is equivalent to "False". Whenever a config.pck is written or read in a process, __timestamp is set to the current mod time of the config.pck. Thus, the value of self.__timestamp for a cached list object is always <= mtime.
You're still vulnerable to "date -s", adjtime, and friends, though, and of course you'll have some undesirable cache misses at times when it would be nice if you didn't.
Always, I think.
A better way would be to add a serial number.
As has already been mentioned, it does no good to put a serial number in the list object since you'd have to read the config.pck to get the serial number. It would have to be encoded in the file name or stored in a separate database. Storing it separately adds the complication of an additional file plus making sure the additional file is sync'd with config.pck. Encoding it in the file name seems complex as well.
I lean towards the simpler approach of just reading the config.pck every time.
Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32)
iD8DBQFIrgfnVVuXXpU7hpMRApeVAKDAv5aw4Mc8x6P9KevoWIBYIjaqFgCcDlcq FpC83W+m1C8Vjlfql27QrtA= =/joM -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mark Sapiro wrote:
Stephen J. Turnbull wrote:
Mark Sapiro writes:
1) add_member saves the list with the first member. 2) VirginRunner gets there first, instantiates and caches the list. It then locks the list, processes the welcome and saves and unlocks the list. 3) add_member gets the lock, adds the second member and saves the list. 4) Virgin runner gets the second welcome. The list is cached, so it uses the cached instance. It then locks the list which ultimately calls MailList.__load() to refresh the list data, but __load() does
mtime = os.path.getmtime(dbfile) if mtime <= self.__timestamp: # File is not newer return None, None
Shouldn't "mtime < self.__timestamp" do the right thing (much more often)?
I didn't reply sooner because when I first saw this, I didn't read it carefully, and I didn't "get" what Stephen was saying. Then I had to think about it.
I have concluded that barring resetting of clocks backward, "mtime < self.__timestamp" is equivalent to "False". Whenever a config.pck is written or read in a process, __timestamp is set to the current mod time of the config.pck. Thus, the value of self.__timestamp for a cached list object is always <= mtime.
I have thought about this some more and have come up with the attached patch which I have tested with Max's script and propose for Mailman 2.2. The patch changes the '<=' test to '<' as Stephen suggests, but then so it won't effectively disable ever not reloading the config.pck, it changes the setting of __timestamp following a successful read from the mod time of the config.pck to the current time truncated to an int. Thus, once we've read a config.pck that's more than a second old, a subsequent load of the same object will skip rereading the config.pck if it wasn't updated in the interim. - -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) iD8DBQFIryLmVVuXXpU7hpMRApoCAJ9hPl/yxi8n9ZAT04s2+TCG6/rvbwCg1NJ6 HlgyY/PB3s6l7B4diGO0zb8= =5V+H -----END PGP SIGNATURE----- === modified file 'Mailman/MailList.py' --- Mailman/MailList.py 2008-08-21 21:35:20 +0000 +++ Mailman/MailList.py 2008-08-22 18:18:38 +0000 @@ -597,8 +597,11 @@ # file doesn't exist, we'll get an EnvironmentError with errno set # to ENOENT (EnvironmentError is the base class of IOError and # OSError). + # We test strictly less than here because the resolution is whole + # seconds and we have seen cases of the file being updated by + # another process in the same second. mtime = os.path.getmtime(dbfile) - if mtime <= self.__timestamp: + if mtime < self.__timestamp: # File is not newer return None, None fp = open(dbfile) @@ -616,8 +619,9 @@ return None, e finally: fp.close() - # Update timestamp - self.__timestamp = mtime + # Update the timestamp. We use current time here rather than mtime + # so the test above might succeed the next time. + self.__timestamp = int(time.time()) return dict, None def Load(self, check_version=True):
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Aug 21, 2008, at 8:27 PM, Mark Sapiro wrote:
As has already been mentioned, it does no good to put a serial
number in the list object since you'd have to read the config.pck to get the serial number. It would have to be encoded in the file name or
stored in a separate database. Storing it separately adds the complication of an additional file plus making sure the additional file is sync'd with config.pck. Encoding it in the file name seems complex as well.I lean towards the simpler approach of just reading the config.pck
every time.
+1
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin)
iEYEARECAAYFAkizw64ACgkQ2YZpQepbvXFMuwCfRKzhFhEnTIaQLl1YgqeuTp7w O3QAn3fRWR8OytLtI3F4JXjLnmhmgLsY =XpWr -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Aug 5, 2008, at 7:34 PM, Mark Sapiro wrote:
I think I see the problem. It is related to qrunner list caching and the fact the there is insufficient precision in the list instance's __timestamp
The scenario is the following
add_member saves the list with the first member.
VirginRunner gets there first, instantiates and caches the list. It then locks the list, processes the welcome and saves and unlocks the list.
add_member gets the lock, adds the second member and saves the
list.Virgin runner gets the second welcome. The list is cached, so it uses the cached instance. It then locks the list which ultimately calls MailList.__load() to refresh the list data, but __load() does
mtime = os.path.getmtime(dbfile) if mtime <= self.__timestamp: # File is not newer return None, None
The problem is os.path.getmtime() returns a time in seconds so if we are still in the same second as step 2), we don't refresh the list.
Thank you very much Max for providing a script to reproduce the
problem.I'm not sure of the best solution. I'm leaning towards dropping the __timestamp test or perhaps replacing it with a file size test, but that too may be unreliable.
I wonder if the list cache is still worth it? I've run into trouble
with it in the recent past and I suspect that whatever benefits we got
from it in ancient times, may not be so relevant today. My first
inclination would be to ditch the cache, but that may not completely
solve this issue (ah, for the backing of a real database!).
Ultimately the timestamp check probably has to go. If we try to load
the list, we should just load it and not try to be so fancy to avoid
reading from disk. Yes, it means more I/O but reliability is more
important, IMO. I'm not convinced a size check is worth it.
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin)
iEYEARECAAYFAkiY+qMACgkQ2YZpQepbvXHUbgCfeWIwkAjv/9wYpJa9vQ16dwCP jYIAn0wVNl3JXm+9R2LJ5AUa46FwhNAN =P5Yo -----END PGP SIGNATURE-----
On Tue, Aug 05, 2008 at 09:13:07PM -0400, Barry Warsaw wrote:
I wonder if the list cache is still worth it? I've run into trouble
with it in the recent past and I suspect that whatever benefits we got
from it in ancient times, may not be so relevant today. My first
I expect cPickle or even pickle are pretty fast, and the config.pck is a fairly straightforward data structure, isn't it? Not deeply recursive or a complicated graph. One experiment would be to create a list with, say, 100,000 random foo@example.com addresses and benchmark how much time it takes to unpickle it. I'll try to do that tomorrow on a real computer (not this laptop).
--amk
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Aug 5, 2008, at 9:50 PM, A.M. Kuchling wrote:
On Tue, Aug 05, 2008 at 09:13:07PM -0400, Barry Warsaw wrote:
I wonder if the list cache is still worth it? I've run into trouble with it in the recent past and I suspect that whatever benefits we
got from it in ancient times, may not be so relevant today. My firstI expect cPickle or even pickle are pretty fast, and the config.pck is a fairly straightforward data structure, isn't it? Not deeply recursive or a complicated graph. One experiment would be to create a list with, say, 100,000 random foo@example.com addresses and benchmark how much time it takes to unpickle it. I'll try to do that tomorrow on a real computer (not this laptop).
Hi Andrew, any results?
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin)
iEYEARECAAYFAkics18ACgkQ2YZpQepbvXFX2gCdHOghfr9eb6xRZ1/delxPnw+2 t3wAnjgqAQhtiUUGYQSqQF0VT+ATyt+V =tyGZ -----END PGP SIGNATURE-----
On Fri, Aug 08, 2008 at 04:58:07PM -0400, Barry Warsaw wrote:
recursive or a complicated graph. One experiment would be to create a list with, say, 100,000 random foo@example.com addresses and benchmark how much time it takes to unpickle it. I'll try to do that tomorrow on a real computer (not this laptop).
Hi Andrew, any results?
The simple test program I used is below. For a list with 89531 addresses, the config.pck file is 9248317 bytes = 8.9M. Doing ten loads and then ten saves in a row, the average load time is 1.36sec and the average save time is 4.5sec.
This is on a development server here at Matrix, which has two 1.1GHz Intel CPUs and 2Gb of RAM; a respectable machine, but not what you'd currently use for a server. So I think pickle really is pretty fast. Of course, if you had your Mailman installation on a busy mail or database server, all that I/O might kill you, but I think giving up on the mtime caching is not completely unreasonable.
--amk
import time
from Mailman import MailList
L = [] for i in range(10): s = time.time() ml = MailList.MailList('amk-speed-test', lock=1) e = time.time() ml.Unlock() L.append(e-s) print e-s
print 'Average loading time=', sum(L)/len(L)
L = [] ml.Lock() for i in range(10): s = time.time() ml.Save() e = time.time() L.append(e-s) print e-s
ml.Unlock() #print L print 'Average save time=', sum(L)/len(L)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Aug 8, 2008, at 5:41 PM, A.M. Kuchling wrote:
On Fri, Aug 08, 2008 at 04:58:07PM -0400, Barry Warsaw wrote:
recursive or a complicated graph. One experiment would be to
create a list with, say, 100,000 random foo@example.com addresses and
benchmark how much time it takes to unpickle it. I'll try to do that tomorrow on a real computer (not this laptop).Hi Andrew, any results?
The simple test program I used is below. For a list with 89531 addresses, the config.pck file is 9248317 bytes = 8.9M. Doing ten loads and then ten saves in a row, the average load time is 1.36sec and the average save time is 4.5sec.
This is on a development server here at Matrix, which has two 1.1GHz Intel CPUs and 2Gb of RAM; a respectable machine, but not what you'd currently use for a server. So I think pickle really is pretty fast. Of course, if you had your Mailman installation on a busy mail or database server, all that I/O might kill you, but I think giving up on the mtime caching is not completely unreasonable.
Thanks for the feedback Andrew.
I don't know if it's worth changing for 2.1; I think it's a rare
problem and the workarounds are now all in the archive. It's probably
worth changing for 2.2 (and is of course moot for 3.0), but it's still
probably not worth making it configurable. For 2.2, let's do the
right thing and if we can make it fast in the meantime, great!
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin)
iQCVAwUBSJy+inEjvBPtnXfVAQJiYgP+LPZABA78c604XzB39lBpkF/mfu/EEPE8 7aojEnqdaiorAA4PLQppAPNuF8oz9rQU87q/WVSJLdUo87HieTqddOy/CFwPJfYS ixdJmc/k2TdnoLSFykcHhriJUQtfB1TNCodPG2BbFer7b4tEjyIRK11W6bC4gIM5 HfHYl1GLuck= =ARJn -----END PGP SIGNATURE-----
participants (7)
-
A.M. Kuchling
-
Barry Warsaw
-
Barry Warsaw
-
Mark Sapiro
-
Massimo Lanfranconi
-
Max Lanfranconi
-
Stephen J. Turnbull