[Mailman-Users] suppress duplicate when posting addressed tolistand its alias name

Wed Nov 7 02:02:17 CET 2012

On Tue, 2012-11-06 at 11:26:40 -0800, Mark Sapiro wrote:

> Actually, AvoidDuplicates.py ccould serve as a good example, but it is
> currently not actually used. It is experimental and is bot included in
> the default GLOBAL_PIPELINE.

As you noted in your follow-up, the docstring does not at all describe
what that handler actually does.  I learned this when actually stepping
through the code. :)

> The major problem with keeping these data in-memory other than purging
> "old" entries so that the dictionary doesn't grow too large, is that
> in-memory data aren't shared between runners so if the incoming queue
> is sliced, the multiple copies of IncomingRunner do not have access to
> each other's data.
> 
> In your case, the input to the hash on which runners are sliced
> includes all the message headers and the listname so it is likely that
> the "equivalent but different" listname messages will be in different
> slices of the hash space.
> 
> This is not a concern if IncomingRunner is not sliced. It is also not a
> concern with a disk based cache as long as buffers are flushed after
> writing because IncomingRunner locks the list whose message is being
> processed which should prevent race conditions between different
> slices of IncomingRunner.

Then, would it make sense (or be overkill) to have the handler populate
a dict of key, value = message-id, timestamp?  And, store that dict in a
pickle whose filename is derived from mlist.internal_name()?

Obviously, this would result in a lot of pickles that are constantly
opened, edited (and, periodically cleansed), and closed.  Is the
performance cost/benefit prohibitive?  I would also be relying on the
fact that a handler is never concurrently called for the same list -- is
that understanding accurate? -- which avoids the scenario in which we
are trying to simultaneously manipulate the same pickle.

-- 
Sahil Tandon