Storing extra data during pipeline processing

Hello,
I am adapting the GPG patch for mailman 2.1.x for our project and want to gather some related data in Approve and Scrubber handlers and use them in the archiver. Can you please advise me how to attach some metadata to a message to use it in later stages?
Also noticed Scrubber being called multiple times per message (from normal pipeline, from digester, from archiver). I want to verify GPG signatures of the attachments in scrubber and redoing it multiple time is wasteful of system resources, I want to attach some metadata after the first checking to prevent it, too.
I have tried already to use msgdata parameter or adding headers to message itself, but was not successful so far. I was thinking about adding external database and putting the data there by message ID, but sure there must be a better way? I hope to publish the code some day, too.
Juraj

On May 07, 2015, at 10:51 PM, Juraj Variny wrote:
I have tried already to use msgdata parameter or adding headers to message itself, but was not successful so far. I was thinking about adding external database and putting the data there by message ID, but sure there must be a better way? I hope to publish the code some day, too.
Can you give some details on what didn't work about using the msgdata parameter? This always flows with the message through the pipeline and is preserved in the pickle files as the message moves from runner to runner. It's the way handlers are supposed to record information on the message as its being processed.
This is even more important in Mailman 3 where we've split the pipeline into rules and handlers. Rules are run to determine moderation behavior and rules are never supposed to modify the message. They communicate state to (possible) later handlers via the msgdata dictionary.
Cheers, -Barry

On 05/07/2015 01:59 PM, Barry Warsaw wrote:
On May 07, 2015, at 10:51 PM, Juraj Variny wrote:
I have tried already to use msgdata parameter or adding headers to message itself, but was not successful so far. I was thinking about adding external database and putting the data there by message ID, but sure there must be a better way? I hope to publish the code some day, too.
Can you give some details on what didn't work about using the msgdata parameter? This always flows with the message through the pipeline and is preserved in the pickle files as the message moves from runner to runner. It's the way handlers are supposed to record information on the message as its being processed.
See my reply in this thread. I think I understand.
This is even more important in Mailman 3 where we've split the pipeline into rules and handlers. Rules are run to determine moderation behavior and rules are never supposed to modify the message. They communicate state to (possible) later handlers via the msgdata dictionary.
In MM 3 one can have a rule which checks and verifies signatures and stores results in the msgdata. Than a handler can use that data to add a message header both for recipient info and to inform other bits that might run after there's no longer any msgdata dictionary traveling with the message.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Thanks, it helped me to clear the confusion.
I have decided to modify Scrubber so that its work is split, when in normal pipeline it only saves the attachments to disk, does gpg stuff, but keeps message unchanged(only headers added). Message is then processed when called from archiver.
The change came out quite trivial and I wonder why it was not done that way in the first place.
Juraj
On Thursday 07 May 2015 14:40:53 Mark Sapiro wrote:
On 05/07/2015 01:59 PM, Barry Warsaw wrote:
On May 07, 2015, at 10:51 PM, Juraj Variny wrote:
I have tried already to use msgdata parameter or adding headers to message itself, but was not successful so far. I was thinking about adding external database and putting the data there by message ID, but sure there must be a better way? I hope to publish the code some day, too.
Can you give some details on what didn't work about using the msgdata parameter? This always flows with the message through the pipeline and is preserved in the pickle files as the message moves from runner to runner. It's the way handlers are supposed to record information on the message as its being processed.
See my reply in this thread. I think I understand.
This is even more important in Mailman 3 where we've split the pipeline into rules and handlers. Rules are run to determine moderation behavior and rules are never supposed to modify the message. They communicate state to (possible) later handlers via the msgdata dictionary.
In MM 3 one can have a rule which checks and verifies signatures and stores results in the msgdata. Than a handler can use that data to add a message header both for recipient info and to inform other bits that might run after there's no longer any msgdata dictionary traveling with the message.

On 05/11/2015 12:48 PM, Juraj Variny wrote:
I have decided to modify Scrubber so that its work is split, when in normal pipeline it only saves the attachments to disk, does gpg stuff, but keeps message unchanged(only headers added). Message is then processed when called from archiver.
The change came out quite trivial and I wonder why it was not done that way in the first place.
The scrubber only job is to flatten a message to plain text for the archive and the plain format digest. Adding it to the pipeline to OPTIONALLY process all messages is a relatively recent change.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Thanks, it helped me to clear the confusion.
I have decided to modify Scrubber so that its work is split, when in normal pipeline it only saves the attachments to disk, does gpg stuff, but keeps message unchanged(only headers added). Message is then processed when called from archiver.
The change came out quite trivial and I wonder why it was not done that way in the first place.
Juraj
On Thursday 07 May 2015 14:40:53 Mark Sapiro wrote:
On 05/07/2015 01:59 PM, Barry Warsaw wrote:
On May 07, 2015, at 10:51 PM, Juraj Variny wrote:
I have tried already to use msgdata parameter or adding headers to message itself, but was not successful so far. I was thinking about adding external database and putting the data there by message ID, but sure there must be a better way? I hope to publish the code some day, too.
Can you give some details on what didn't work about using the msgdata parameter? This always flows with the message through the pipeline and is preserved in the pickle files as the message moves from runner to runner. It's the way handlers are supposed to record information on the message as its being processed.
See my reply in this thread. I think I understand.
This is even more important in Mailman 3 where we've split the pipeline into rules and handlers. Rules are run to determine moderation behavior and rules are never supposed to modify the message. They communicate state to (possible) later handlers via the msgdata dictionary.
In MM 3 one can have a rule which checks and verifies signatures and stores results in the msgdata. Than a handler can use that data to add a message header both for recipient info and to inform other bits that might run after there's no longer any msgdata dictionary traveling with the message.

On 05/07/2015 01:51 PM, Juraj Variny wrote:
I am adapting the GPG patch for mailman 2.1.x for our project and want to gather some related data in Approve and Scrubber handlers and use them in the archiver. Can you please advise me how to attach some metadata to a message to use it in later stages?
Store it in the msgdata which is intended exactly for this purpose and is passed as a separate object in the queue entry when the message is queued for downstream runners. Except this may not work - see below.
Also noticed Scrubber being called multiple times per message (from normal pipeline, from digester, from archiver). I want to verify GPG signatures of the attachments in scrubber and redoing it multiple time is wasteful of system resources, I want to attach some metadata after the first checking to prevent it, too.
Yes. The scrubber can actually process the same message more than once, but never more than twice. The purpose of the scrubber is to flatten the message to plain text and store aside any message parts that can't be converted to plain text. This must be done for both the pipermail archive and for the plain format digest. Since archiving and digesting are separate asynchronous processes, scrubbing is normally done twice; once in each process. Also, the two processes are independent and asynchronous so either one may process a given message before the other.
You can set scrub_nondigest to Yes, in which case, scrubbing is done in the incoming pipeline and has nothing to do when called during digesting or archiving. This may or may not be desirable depending on the list because even message and MIME digest subscribers receive a scrubbed message.
I have tried already to use msgdata parameter or adding headers to message itself, but was not successful so far. I was thinking about adding external database and putting the data there by message ID, but sure there must be a better way? I hope to publish the code some day, too.
The msgdata metadata doesn't work for passing message data from the incoming pipeline to the digest process, because at the time the digester is processing and maybe scrubbing messages for the digest, it is reading the messages from digest.mbox and there is no metadata. Adding headers to the message in Handlers before ToDigest should work.
ArchRunner does have the metadata when processing a message for the archive, but it doesn't pass it to the archiver.
But, if you are using Scrubber.process to do the GPG stuff, it probably won't do anything if scrub_nondigest is No, and then because archiving and digesting are working with different copies of the message, they can't communicate via message headers either.
I suggest you look at verifying signatures in Scrubber.process prior to the point at which at which it returns if scrub_nondigest is No, or better still, just add a custom handler between MimeDel and Scrubber in the pipeline to verify sigs and set the result in a message header that can be used by later processes. (See <http://wiki.list.org/x/4030615>).
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Barry Warsaw
-
Juraj Variny
-
Juraj Variny
-
Mark Sapiro