-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Access to a Mailman 2.0 list times out without the browser displaying
anything. This list has more than 450 held messages amounting to 8MB
in mailman/data. How should this be repaired without disrupting this
or other lists? It looks like another list on the same host with
about 50 held messages amounting to about 700KB can be accessed OK
while another with 290 messages amounting to 3MB cannot.
Longer term, would be well to let the web pages be served up in parts
rather only in their entirety, if indeed that is what causes the
failures described above?
jam
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: OpenPGP encrypted mail preferred. See <http://www.gnupg.org/>
iEYEARECAAYFAjo4XtUACgkQUEvv1b/iXy+rjACgnqtwCTeEzW1IX1BkLbLZnj7d
/YcAniepLedQVf8xFfG9u2NeURiflxiQ
=qonr
-----END PGP SIGNATURE-----
On Mon, 11 Dec 2000 18:26:26 -0800
Chuq Von Rospach <chuqui(a)plaidworks.com> wrote:
> At 5:17 PM -0800 12/11/00, J C Lawrence wrote:
>> 1) Is there a GPL distributed queue processing system ala IBM's
>> MQ about? I've not been able to find one.
> <http://sourceforge.net/projects/queue/>
> wehn I evaluated it a while back, it wasn't stable on solaris, but
> it had the functionality I wanted.
Any experience with GNQS?
http://www.gnqs.org/
It looks a little weak for what we neet (mostly at the
parallelisation points), but interesting otherwise.
--
J C Lawrence claw(a)kanga.nu
---------(*) : http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--
On Fri, 8 Dec 2000 10:36:25 -0800
Chuq Von Rospach <chuqui(a)plaidworks.com> wrote:
>> Background for those who don't know: zodb is the Zope Object
>> Database, ZEO is Zope Enterprise Objects.
> My only worry about this is adding enough complexity and overhead
> that mailman loses it's attractiveness to the small site.
I argue similarly. To echo you Chuq, a primary goal should be
ability to integrate. That't given I'm increasingly coming to
question the use of Python pickles in the first place, let alone use
of custom database implementations. I don't see that the value is
there for the increased complexity and isolation of the system. We
already have enough problems given the fact that that the membership
base is kept in a pickle that I don't see much reason to go further
down that rat hole. Yes, pickles are nice -- for private data that
will never be seen or accessed by an external system.
>> We'd have to handle collisions for multiple qrunner processes,
>> potentially on separate machines. One way that doesn't involve
>> locking shenanigans is to divide the hash space up and assign a
>> segment to each out-qrunner process.
> here's another way that should work: each record has a locking
> field in it. When qrunner wants to execute that item, it reads the
> field. If the field is NULL, it writes its ID (hwatever it is,
> guaranteed unique) into that locking field. It then waits a beat,
> and reads it back. if it reads back its own ID, it knows it owns
> the record and can execute it. If it reads back someone else's ID,
> it lost the lock, but someone else owns the record so it can skip
> it and move on.
You missed a few race conditions in there. As I wrote earlier the
only NFS operation which is guaranteed to be unique across
implementation is creat(). Given that, lock files based on the hash
of the message would seem to be the answer. If you can create a
lock file with with a filename based on the hash of the message, you
have rights to deliver it. If not, well, someone else obviously has
those rights.
The problem that remains is lockfile aging.
> you can simulate atomic locks with a little thought and
> cooperative processes, by everyone writing to the store and then
> seeing who won. A LOT easier from and administrative view than
> partitioning hashes and the like, IMHO.
This assumes that writes are atomic. The problem is that they
occassionally aren't.
--
J C Lawrence claw(a)kanga.nu
---------(*) : http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--
On Mon, 11 Dec 2000 23:43:07 -0800
Chuq Von Rospach <chuqui(a)plaidworks.com> wrote:
> At 11:15 PM -0800 12/11/00, J C Lawrence wrote:
>> My intent so far is just "deliver no more than N mesages per
>> minute" per outbound aueue runner. It knocks the peaks off the
>> problem, and the base structure ie easy to extend from there (and
>> I don't want to think about that now).
> and leaves it up to the admin to tune. That's probably fine for
> 3.0. full queue watching adn self-throttling can wait. it's nice
> to have, but we probably shouldn't try to do everything at
> once. Just to leave the hooks for later...
Precisely.
>> I should note that my base design is very heavy in terms of
>> process forks (which happen to be quite light weight under Linux,
>> but that's another matter).
> There are definitely places for threads, but to be honest, I see
> some tendency of people to go thread-happy. it's the "new puppy",
> so everything needs to be designed around threads... Given the
> amount of I/O we have going on, the fork overhead is going to get
> lost in the noise in most cases.
That's my hope.
>> There's a directory full of scripts/programs.
>>
>> Run them all, in directory sort order, on this message to
>> determine if we should do XXX with it.
> and who does this? this missing core policeman process, of course
> (grin).
Nope. The individual process which somehow got nominated for
picking up a message sitting in a list pending queue. So, it picks
up the mesasges, asks for its distribution list, gets it, and shoves
them both over into the outbound queue. Later some arbitrary
outbound queue processor wins/gets control of that message, opens an
SMTP session, and shovels the message down to the list of RCPT TOs.
Nobody is responsible for more than their tiny area of the field.
There is a pseudo orchestra leader, but all he really does is fork
processes that go see if there is anything in the queues to process,
and if so, start on them.
> but -- I'd suggest against this approach. There are problems. to
> start, the approach is pretty darn I/O heavy. you'd be better off
> loading all of this stuff into an internal database, and making it
> a memory-resident table, not a disk-based.
Kinda tough for LDAP or SQL where the list of membersi is dynamic
and depends on the message itself (non-traditional lists).
But yes, it hurts. The default case will be some sort of
local/cheap DB with a single process. The idea is that the above
architecture is there should it be needed
> Administratively, it has some issues as well, since you're more or
> less requiring that someone with a CLI deal with a lot of the
> configuration -- or opening you up to all sorts of web-based
> attacks.
Semi. The idea is that the CLI guy installs the base set of scripts
that are potentially available for to a given list. The list owner
then picks from that library for his list, and assmbles and orders
them (building a symlink table on dist) via his web interface (drop
and combo boxes).
> Instead, you store scripts, and the CLI admin manages that
> process, but configuration is within Mailman, and web based.
Precisely.
> i've been working on a new API for the for the
> moderator/autobounce/admin/anti-spam stuff. I'll post that in a
> day or so, what I have, because I think the way I'm putting it
> together is relevant to how I think the overall control system
> could be done.
I haven't really thought about bounce processing at all yet.
> You want to embed nothing (IMHO), because it reduces the
> complexity of all of the pieces and ofrces you to keep the
> interfaces clean and rigourous.
Yeah.
>> I don't see the different queues needing markedly different
>> designs, but needing to be able to have their processes supports
>> cleanly divisible. The base structures end up markedly similar
>> after that.
> Other than, say, imagining a system wher earchives are on a
> different machine (or two), and the search engine on a third (or
> fourth), so you want to be able to distribute the processing
> cleanly.... And the realization that archives and digest stuff can
> be held into a low-priority queue and turned into idle-time
> processing tasks. A big plus if you've got a busy system a little
> closer to the edge than you like.
I haven't thought about system load sensitivities yet, but I don't
see any innate reason they couldn't be another variable thrown into
the, "What am I currently allowed to process" equation.
>> Process fork overhead is a problem I've not confronted yet.
> And I wouldn't worry about it much. don't think it's going to be
> a problem, other than in the MLM->MTA interface where you might be
> doing a lot of spawning and forking to parallelize, VERP, or
> whatever.
My idea for VERP is trivially simple:
The member script which generate the list of RCTP TOs which are
attached to a pending message will periodically add a second token
(a hash value) after the email address, seperated by whitespace.
Note: instead of text a DMB would work just as well, perhaps
better.
The process that then picks up a message from outbound notices the
hash token and constructs a special envelope for that address
only, using the hash string as +suffix to the envelope return
address.
Want VERP all the time? Members always generates hash values. Or
just a percentage of the time, or as a function of how long it was
since we last caught a bounce from that address, or as a function of
how much we like that domain.
The idea is that VERPed messages are built on the instant of handing
them off to an MTA.
> And that can be minimized and avoided with some careful design. In
> the rest of the system, don't bother. When I'm talking about
> lightweight, I was meaning code compleixity and feature creep. You
> want to stuff as much into external code pieces that are brought
> in via queueing and messagings, and keep it out of the control
> piece.
Bingo.
>> BTW I'd like to have the MLM archive messages such that a member
>> can request, "SEND ME POST XXX" and have the MLM send it to him.
>> Ditto for digests. This is in addition to any web archiving.
> and another flavor of digest, what I call the HTML-TOC. Simply a
> message full of digest info (poster, subject, maybe the first
> couple of lines), and a URL to pull it out of archives. Some folks
> want a digest to skim, some folks only want header data -- so why
> send all those bytes that won't be read?
Ahh, excellant point, Digest really should be an OOB process handled
by their own queue. Yup. Absolutely.
--
J C Lawrence claw(a)kanga.nu
---------(*) : http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--
Got this message when trying to click on the admin page for one of my
lists.
--
Phillip P. Porch <root(a)sco.theporch.com> NIC:PP1573 finger for
http://www.theporch.com UTM - 16 514546E 3994565N GnuPG key
---------- Forwarded message ----------
Date: Mon, 11 Dec 2000 18:53:12 -0600
From: Phillip Porch <root(a)theporch.com>
To: Phillip Porch <root(a)theporch.com>
Subject: Bug in Mailman version 2.1a1
On Mon, 11 Dec 2000 18:26:26 -0800
Chuq Von Rospach <chuqui(a)plaidworks.com> wrote:
> At 5:17 PM -0800 12/11/00, J C Lawrence wrote:
>> 1) Is there a GPL distributed queue processing system ala IBM's
>> MQ about? I've not been able to find one.
> <http://sourceforge.net/projects/queue/>
> wehn I evaluated it a while back, it wasn't stable on solaris, but
> it had the functionality I wanted.
Yeah, I just spent some time playing around there. Its not
encouraging right now.
>> 1) Insert MessageID headers with created values in messages that
>> don't contain any MessageID.
> that's no problem, although in theory, the MTA should do it for
> you. The only way I can think of this (if everyone acts properly)
> happening is someone somehow delivering a message to Mailman that
> never touches an MTA. I'm not sure that's possible.
Not exactly. My architecture has the ability to create messages
internally that are then passed back thru the processing system.
I'm not interested in passing back out to the MTA (wasted cycles and
need to know what machine has a valid MTA on it), or in generating
IDs at the point of message generation (which is a template), so I'd
rather just punt and just build IDs when I need them.
>> 2) Detect collisions within its rather small/arbitrary window,
>> and auto-discard/reject messages subsequent messages with a
>> duplicate MessageID. This would not a rigorous dupe check, but
>> would only check for dupes against the messages already in the
>> Mailman queue (ie received and not yet sent back out).
> It's not that expensive to keep a hash of message IDs, where the
> key is the Message-ID, the value is a timestamp. And, say, once a
> day, you delete records where the timestamp is older than
> (configurable) days. If you're gong ot dupe check at all, why not
> do it for real?
I could. At this point the ONLY reason I'm interested in message
IDa is for the moderation interface which needs to be assured that
no two messages in the moderation queue for a given list have the
same ID. I guess a little DBM file wouldn't hurt, but I don't think
I'll spec it.
>> (MUA emitting non-unique or no IDs, mail dupes, etc).
> it's not the MUA that's responsible for message-iid's, it's the
> MTA.
Oops, you're right. I forgot that.
>> 4) While it seems a subtle small point, its bugging me. Given
>> user account support, and messages to a given user bouncing,
>> should that user be unsubscribed from only that list, or from all
>> lists at that site?
> I unsubscribe from the site. I'm sure at some point, an email sent
> from A might bounce and still be valid if sent from B, but that
> case is so rare I wouldn't think of wasting time on it, because
> the only way I can see taht happen (minus broken systems, of
> course) is someone who decides to try to unsubscribe by blocking a
> list, isntead of following the directions. And I don't see we need
> to write code into mailman to help users not follow the
> instructions.... (grin)
I kinda like the way you think.
>> Where this is actually bugging me most is for virtual domains and
>> whether or not lists in a virtual domains should be transparent
>> or opaque to a bounce on a list in a different virtual domain?
> since we've talked about a single data store for subscriber data,
> I think you do it globally. If they really want opaqueness across
> virtual domains, run mujltiples copies of Mailman. that'll still
> be an option, after all.
<nod> Neater.
>> For those interested the basic model is built upon arbitrary
>> process queues and pipes.
> which is a nice system -- it's how I finally did my big muther
> list server, but instead of gnu queue, I'm using QPS.
I'm ending up with a sort of pseudo-queue model. Still a "list-mom"
cron job, but it works by orchestrating a series of arbitrary
process pipes as orphaned children. I'd like to go for a full queue
implementation, but I think the culture shock and overhead for the
small case might be a bit much.
--
J C Lawrence claw(a)kanga.nu
---------(*) : http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--
On Mon, 11 Dec 2000 20:28:26 -0500
Ken Kyler <ken(a)kyler.com> wrote:
>> 4) While it seems a subtlesmall point, its bugging me. Given
>> user account support, and messages to a given user bouncing,
>> should that user be unsubscribed from only that list, or from all
>> lists at that site? Where this is actually bugging me most is
>> for virtual domains and whether or not lists in a virtual domains
>> should be transparent or opaque to a bounce on a list in a
>> different virtual domain?
>>
>> The admin in me says, "Hell yes!" The commercial reality nut in
>> me demurrs (think about list hosting for small companies and
>> their PR image given transparent virtual hosting).
> I'd like a per-list option to do just that. Some virtual hosts
> might like that option and others won't.
Awwww crap. I was kinda hoping to cut out another level of
abstraction. (Must remember to add subject line next time).
--
J C Lawrence claw(a)kanga.nu
---------(*) : http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--
Summary since the mail is long:
I'm trying to find out why qrunner needs to lock a list before doing
delivery of a message. I'm getting corruption over NFS under high,
concurrent, load (I'm setting up and testing the new Sourceforge list
servers)
The basic question of this Email is: can I have qrunner ship Emails without
modifying the lists' config.db?
Longer version with explainations and details:
So, I setup that NFS shared mailman tree I was talking about a little while
ago. As a reminder:
/var/local/mailman is NFS exported
/var/local/mailman/qfiles is symlinked to ../mailman.local.
I then applied the two following patches to mailman:
--- mailman-2.0.orig/cron/qrunner Mon Sep 18 14:28:42 2000
+++ mailman-2.0/cron/qrunner Wed Dec 6 14:02:28 2000
@@ -96,7 +96,7 @@
import signal
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
-QRUNNER_LOCK_FILE = os.path.join(mm_cfg.LOCK_DIR, 'qrunner.lock')
+QRUNNER_LOCK_FILE = os.path.join(mm_cfg.QUEUE_DIR, 'qrunner.lock')
LogStdErr('error', 'qrunner', manual_reprime=0, tee_to_stdout=0)
--- mailman-2.0.orig/Mailman/Logging/StampedLogger.py Mon Mar 20 22:25:58 2000
+++ mailman-2.0/Mailman/Logging/StampedLogger.py Wed Dec 6 16:20:03 2000
@@ -16,6 +16,7 @@
import os
import time
+import socket
from Logger import Logger
class StampedLogger(Logger):
@@ -66,7 +67,9 @@
label = "(%d)" % os.getpid()
else:
label = "%s(%d):" % (self.__label, os.getpid())
- prefix = stamp + label
+ hostname = socket.gethostname() + " "
+ prefix = stamp + hostname + label
Logger.write(self, "%s %s" % (prefix, msg))
if msg and msg[-1] == '\n':
self.__bol = 1
The plan here is to have two mailing list servers sharing the same list
configs, but running two different queues to avoid the mailman -> exim
bottleneck (I have 2 machines with 2 CPUs, and I don't want 3 CPUs idle
because a single qrunner is holding a global lock. Sure, it is rather fast
with exim, but since I have two machines (required for failover), I don't
really want one sitting idle, and want to do load balancing too)
To stress test everything, I sent 1000 local messages on each machine at the
same time, and while they were able to get their own qrunner locks, they had
to fight for the list lock (all messages were to the same list, which only
had one user).
Well, it was fast:
15:12:10 Start of injection of 1000 Emails on usw-sf-list1 and usw-sf-list2
15:14:06 1000 Emails accepted and queued on usw-sf-list1
15:14:11 1000 Emails accepted and queued on usw-sf-list2
15:19:07 usw-sf-list1's mailman shipped all the mails
15:19:37 usw-sf-list2's mailman shipped all the mails
Note that usw-sf-list2 was doing this over NFS, and was only marginally
slower considering this.
As expected, there was some corruption in the shared log files (both logs/smtp
and logs/post are missing about 10% of the lines they should have).
Now, what's less fun is this in my error logs:
Dec 07 15:14:39 2000 usw-sf-list2 (17221) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
Dec 07 15:14:40 2000 usw-sf-list2 (17285) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
Dec 07 15:14:40 2000 usw-sf-list2 (17282) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
Dec 07 15:14:40 2000 usw-sf-list2 post(17285): Traceback (innermost last):
usw-sf-list2 post(17285): File "/var/local/mailman/scripts/post", line 94, in ?
usw-sf-list2 post(17285): main()
usw-sf-list2 post(17285): File "/var/local/mailman/scripts/post", line 73, in main
usw-sf-list2 post(17285): mlist = MailList.MailList(listname, lock=0)
usw-sf-list2 post(17285): File "/var/local/mailman/Mailman/MailList.py", line 79, in __init__
Dec 07 15:14:40 2000 usw-sf-list2 (17291) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
usw-sf-list2 post(17285): self.Load()
usw-sf-list2 post(17285): File "/var/local/mailman/Mailman/MailList.py", line 908, in Load
usw-sf-list2 post(17285): shutil.copy(lastfile, dbfile)
usw-sf-list2 post(17285): File "/usr/lib/python1.5/shutil.py", line 52, in copy
usw-sf-list2 post(17285): copyfile(src, dst)
usw-sf-list2 post(17285): File "/usr/lib/python1.5/shutil.py", line 18, in copyfile
usw-sf-list2 post(17285): fdst = open(dst, 'wb')
usw-sf-list2 post(17285): IOError : [Errno 116] Stale NFS file handle: '/var/local/mailman/lists/test/config.db'
Dec 07 15:14:41 2000 usw-sf-list2 (17349) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
Dec 07 15:14:50 2000 usw-sf-list2 (17905) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
Dec 07 15:14:50 2000 usw-sf-list2 (17913) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
Dec 07 15:14:50 2000 usw-sf-list2 post(17905): Traceback (innermost last):
usw-sf-list2 post(17905): File "/var/local/mailman/scripts/post", line 94, in ?
usw-sf-list2 post(17905): main()
usw-sf-list2 post(17905): File "/var/local/mailman/scripts/post", line 73, in main
usw-sf-list2 post(17905): mlist = MailList.MailList(listname, lock=0)
usw-sf-list2 post(17905): File "/var/local/mailman/Mailman/MailList.py", line 79, in __init__
usw-sf-list2 post(17905): self.Load()
usw-sf-list2 post(17905): File "/var/local/mailman/Mailman/MailList.py", line 908, in Load
usw-sf-list2 post(17905): shutil.copy(lastfile, dbfile)
usw-sf-list2 post(17905): File "/usr/lib/python1.5/shutil.py", line 52, in copy
usw-sf-list2 post(17905): copyfile(src, dst)
usw-sf-list2 post(17905): File "/usr/lib/python1.5/shutil.py", line 18, in copyfile
usw-sf-list2 post(17905): fdst = open(dst, 'wb')
usw-sf-list2 post(17905): IOError : [Errno 116] Stale NFS file handle: '/var/local/mailman/lists/test/config.db'
Dec 07 15:14:50 2000 usw-sf-list2 (17922) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
Dec 07 15:15:44 2000 usw-sf-list2 (21410) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
Dec 07 15:15:44 2000 usw-sf-list2 (21419) test db file was corrupt, using fallback: /var/local/mailman/lists/test/config.db.last
and the fact that 3 mails (out of 2000) didn't make it to my mailbox.
I think I can live with the occasional log corruption (I can also lock the
log files before writing to them), but of course, mail loss is not as good.
I've looked at the qrunner code a bit, and I'm trying to understand why it
needs a lock o the list's config.db
I suppose NFS it to blame for this, and somehow, even though both machines
lock the test list, the locking is somehow not NFS safe (I thought it would
be though).
But then comes the question: why does qrunner have to modify the list's
config.db when it ships a message?
I suppose the relevant piece of code in qrunner is:
try:
keepqueued = dispose_message(mlist, msg, msgdata)
# Did the delivery generate child processes? Don't store them in
# the message data files.
kids = msgdata.get('_kids')
if kids:
allkids.update(kids)
del msgdata['_kids']
if not keepqueued:
# We're done with this message
dequeue(root)
but I have to admit to not understanding what it does.
Is there any way to have qrunner send messages without modifying the list
config and thus without having to lock the list either?
Thanks
Marc
--
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f(a)merlins.org for PGP key
1. The scripts should probably check if you have sufficient privileges
before launching...
/etc/mailman > newlist rt-general
Traceback (innermost last):
File "/usr/sbin/newlist", line 227, in ?
main()
File "/usr/sbin/newlist", line 122, in main
os.setgid(MAILMAN_GID)
OSError: [Errno 1] Operation not permitted
/etc/mailman > sudo newlist rt-general
Enter the email of the person running the list:
2.
General list information can be found at _the_mailing_list_overview_page
_.
When clicking on that link (in Netscape, at least...) and not having
changed the name in /etc/mailman/mm_cfg.py it tries to contact
http://www.localhost.com/cgi-bin/mailman/listinfo
I don't know where the 'www.' and '.com' are coming from, but they
shouldn't be there. After changing localhost to the hostname it worked
fine.
THanks,
Nils.