On Mon, Mar 04, 2002 at 05:16:21PM -0500, Barry A. Warsaw wrote:
Okay, I've looked over all the code. Except for some stylistic issues, which I'll just correct as I go, my biggest concern is the database used in AvoidDuplicates.py.
Yeah, I looked at that too, but being tired, I didn't get as critical as you did. I figured it somehow worked ok for Ben for his lists (bad Marc, no cookie) Now that I think of it, Ben wrote initially wrote this for qrunner from cron, and didn't think about this issue when he ported it to mailman-cvs-sept-2001
It looks like you're keeping an in-memory dictionary mapping addresses to a set of Message-ID:'s. You use this to decide if the recipient address has already received a message of the given Message-ID:
That was my understanding of the code too.
Let's ignore the duplicate or missing Message-ID: issue for now. The biggest problem I see is that 1) you lose all the mappings if you restart your IncomingRunner, and
That's probably not a problem because
- it would only affect a message being processed at the time you kill and restart IncomingRunner, not very likely, and worst case, you do get a second copy.
- You don't restart IncomingRunner often if at all
- When you do restart qrunner, there can be other qwirks, like a message being delivered twice (I've seen this with VERP enabled, I probably killed it while it was delivering a batch to exim, so it didn't complete and did it all over again after the restart)
- your process will grow without bounds until you do restart your IncomingRunner.
I think you're right. You'd have to have a lot of traffic before it catches up with you, but it will eventually if you never restart qrunner.
I'm not sure about the best thing to do. Sticking this data structure in the list, or otherwise making it persistent, could take too much resources for not much gain. The second issue is more important, especially given that all our runners are now long running processes, and I think most of the unbounded memory growth issues are taken care of. Probably the best thing to do is to evict any entry in the dictionary that's older than a day or two.
That sounds like a reasonable plan.
Then again, this whole data structure seems intended to avoid duplicates when lists are crossposted. It shouldn't be necessary if we just want to filter out duplicates to explicitly named recipients. Maybe we don't need both features, as the former seems to be much less requested than the latter?
That's true. The later is nice for instance when you have threads Cced accross mailman-devel and mailman-users, but having the former by itself would be good already. If this is a time issue wrt fixing the code, the duplicate message-ID code could be left behind a global option that is disabled by default and a comment saying that you should be ready to restart your qrunner weekly or daily if you enable it That said, adding a timestamp to them, and deleting everything that's more than 1H old is a better solution.
I think what I'll do for now is code up and test the original approach. I'm on irc now so please join me if you want to talk about it.
now-if-i-can-just-get-OPN-to-stop-kicking-me-off!-ly y'rs,
<-- barry has quit (Read error: 110 (Connection timed out))
:-)
I was saying there: about OPN, I don't know the details (and there were also politics), but the sf.net guys moved their channels to slashnet.org. Seems to work better there
Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key