Alessandro Vesely writes:
First, what Mailman are you talking about? Only Mailman 3 is likely to get these improvements, as Mailman 2 is end-of-life. However, Mailman 2 installations are likely to be around in large numbers for several years, and if Mailman 2 is any evidence, likely few Mailman 3 installations would use these features unless forced to by a disaster like the Yahoo/AOL sudden switch to DMARC p=reject.
Yet, it is possible to undo the transformation that Mailman put in place, thereby validating the original DKIM signature.
It would always be possible to undo all transformations by supplying the original email as part of a multipart/alternative, or perhaps a new multipart subtype, maybe with some kind of device to make reading the message/rfc822 original difficult in standard MUAs. Then the problem is much simpler: the validator need only validate the whole and the original, and the receiver would have all information necessary to decide whether the alleged Mailman version is based on normal processing or is a malicious fake. This would require some changes to Mailman to implement, and to Postorius for a configuration UI. (In the case of Microsoft MUAs, if Mailman is configured to strip HTML, the result might be less than 10% bigger than the original! ;-)
It is sometimes possible to reverse transformations with only the information in the post after Mailman processing. However, some very desirable changes are destructive (eg, anonymized lists, conversion of HTML to plain text, removal of prohibitive attachments). Some non-destructive changes (headers and footers) are highly customizable. So the question is what are the transformations that users want to reverse, and whether that's really possible.
This kind of transformation reversal probably requires no changes to Mailman, just an addition of a Handler which could be written independently and "dropped in" (with a configuration change to the default pipeline). The necessary information about transformations that are configured would be available from Mailman in the usual way (existing Handlers need that information).
Mailman carries out some irreversible changes, such as rewriting To: or Cc: changing the order of the mailboxes,
Does this happen outside of DMARC mitigation? Can you show examples?
or rewriting Content-Transfer-Encoding: irrespective of quotation
s > marks and case (for example "7bit" even if the original, signed
field was spelled as "7Bit").
I'm not aware of such behavior *unless other modifications were done*. In that case, Mailman is specifying the C-T-E it uses, it is not rewriting the original C-T-E.
I guess this behavior is coded deeply in Python libraries,
I don't think so. As far as I know, the email module in Python 3 provides some support for parsing header fields but I don't know why this would change order or spelling of field contents. I would guess that to the extent that it happens it has to do with Mailman-level processing (for example, collecting addresses from the same domain so they can be presented as multiple RCPT TO with a single DATA). I can say for sure that some care was taken to ensure that the order of header fields, including multiple instances of the same field is carefully preserved.
but would like to know developers' opinions. Is that something that could be fixed?
First, the issues with headers could be improved, though not entirely fixed, in DKIM itself by further canonicalizing structured headers before signing or verifying.
I'm not saying that this is the right way forward, but it should be considered.
The second question is about producing a hint to the verifier telling which transformation(s) have been applied to the message. That would come as an additional header field, for example:
This could be done easily, but it would be at best a hint. Among other things, it might be desirable to identify the agent that performs the transformation, as well as the algorithm and perhaps the host and/or the list. Mailman adds footers in different ways, specifically appending text and adding a MIME part. Third party patches are available that dig into HTML structure (at least for Mailman 2). There are lists that feed into lists, and apply their own transformations.
or as an extra tag in a DKIM signature, for example:
DKIM-Signature: v=1; (...) tf=footer; (...)
Not possible without a lot of effort and specific cooperation from MTAs. Mailman doesn't DKIM sign messages, really doesn't want to (there are Python modules for this, but use and configuration would be our responsibility so we'd like to have specialists do it), and probably shouldn't (we're not specialists) -- that should be left to the border MTA of the administrative domain.
That hint could spare the verifier one pass over the message. Is it something that could be implemented? If not, I'd try guessing, according to this scheme:
You're going to have to guess a lot for a long time anyway, because very few installations will implement this header. It's not obvious to me that guessing won't be nearly as accurate as the header might be.
outermost Content-Type: | first entity Content-Type: | transformation | ------------------------+-----------------------------+-----------------+ text/plain | any | footer | ------------------------+-----------------------------+-----------------+ multipart/mixed | multipart/mixed | add-part | ------------------------+-----------------------------+-----------------+ multipart/mixed + any other | mime-wrap | ------------------------+-----------------------------+-----------------+ any other | any | non-reversible | ------------------------+-----------------------------+-----------------+
Does that look correct?
Not 100%. I'm not sure what you mean by "mime-wrap", but if it's Mailman's "Wrap Message" DMARC mitigation, as far as I know nobody uses it. I suspect that pretty much any multipart/mixed may have an added part containing a footer, but it might not.
Currently, there are mailing lists which don't do any change, not even subject tags, in order to avoid breaking DKIM signatures. A somewhat Procrustean solution.
It's the ONLY guaranteed solution, though, because avoiding rewriting is only possible if you *know* that you're distributing modified posts only to sites participating in your reversible modifications protocol (or ignore DMARC p=reject).
I don't think From: rewriting is going to be disabled any time soon.
You're right. You need universal deployment of reverse transformation to make disabling rewriting palatable.
Reply-To: usually comes after From:, thereby requiring to go back to change already parsed fields.
That's not a problem, since DKIM requires reordering fields anyway. The expensive part is not fiddling with the header, it's multiple passes of the signature algorithm.
As an alternative, I'd provide for yet another field to be put near the top of the header.
It's not an alternative. The changes to Reply-To or Cc are *necessary* (in the opinion of the list admin, not Mailman) to preserve the ability of the recipient MTA to respond to author.
Original-From:, say. This may seem redundant, however it serves a different goal. In addition, if the Original-From: is put in place by the original signer, it ratifies its knowledge that From: will be rewritten and its willingness to recover it afterwards.
Could work, but addition of Original-From should be done by DMARC originators, not by Mailman. The name should probably be DMARC-Original-From, as well.
Is this endeavor completely useless, given that the current settings work well enough? Or could it help keeping a consistent DMARC semantics among participants yearning to do so? I'd be glad to hear your opinions...
I don't think it's useless, but I don't see any reason for Mailman to participate until there's a (1) specification of transformations that people want to be reversible, or (2) specific defects that if fixed, or (3) features that if added, would enable reversibility.
For (1), we would just guarantee a particular recognizable format for transformations that should be reversible, and (2) and (3) would be addressed as usual.
As mentioned, the hinting function can be done well-enough by a user-supplied Handler that looks up the list's configuration, determines the transformations that are applied, and inserting the hints in the appropriate place.
Finally, contrary to what we all wish were true, this is not really a choice for mailing lists. It's a choice for recipient ad-dom border MTAs. If they don't buy in in large numbers, I'm not particularly interested in doing the work. I don't see why they would. Most lists I participate in do things like strip large attachments and strip prohibited executables. I think those are very common in general. Most lists I participate in also strip text/html alternatives and many convert text/html to text/plain. If that's the common case, why would a postmaster bother (unless they're a DMARC purist, of course, which may be a good thing but I don't think there are very many of them)? If the postmasters aren't bothering, why would list admins? If the list admins aren't bothering, why should Mailman?
On the other hand, I do think that Mailman can and should enable better analytics on posts by ensuring that we only change parts of a message that we intentionally change. I guess the might include situations like the case where Mailman changes a MIME part, reassembles the whole message, and assigns a C-T-E that happens to have the same semantics as the original C-T-E.
I encourage you (or anyone in the reversible transformation effort) to report inadvertant changes as bugs, and to suggest candidates for "standard formats" we could adopt to make ad hoc reversals more reliable (eg, the list name in a Subject tag should be enclosed in square brackets and match the last component of the List-ID -- not clear how that works with internationalized lists though).
I can't speak for other developers at this point, so I can't promise any proposals would be implemented, but I'm certainly interested and in some cases would definitely be an "advocate on the inside".