Hello,
Sorry if this is too off-topic for the Mailman list and should really be directed to the email-sig list, but as I noticed that one of the GSOC projects touched upon message signing and verification, I thought that there might be more expertise on hand amongst this list's members, and there is a remote possibility that some of my difficulties can inform that work, too.
I've been looking into using the Python email module for signing and verifying messages according to RFC 3156, but one difficulty I encountered was where a message with the following form is received and parsed:
Content-Type: multipart/signed ...
-> Content-Type: ... ...
This is the content being signed (together with the part's headers).
-> Content-Type: application/pgp-signature ...
-----BEGIN PGP SIGNATURE----- ... -----END PGP SIGNATURE-----
Obviously, the email module permits the inspection of the message, and the apparent solution in verifying the message is to obtain the signed content along with the signature (using the get_payload method on the top-level multipart container) and then to pass them both to gpg for verification. Something like this...
content, signature = message.get_payload()[:2]
We might also choose to decode the signature, I imagine, since it may employ a transfer encoding:
signature = signature.get_payload(decode=True)
However, in an almost accidental discovery involving Python 2.5 and 2.7 being used for the respective signing and verification operations (and in reverse), what I found was that the email module would parse the signed content part but then, upon being asked to provide the string representation of the part, it would reformat the headers and thus cause the verification to fail.
So, where the email module in Python 2.5 likes to wrap headers using tab character indents, the module in Python 2.7 prefers to use a space for indentation instead. This means that the module reformats data upon being asked to provide a string representation of it rather than reporting exactly what it received.
The behaviour of the module is fairly understandable - no-one wants to keep the original data around in addition to having a more manageable parsed form of it - but is there a convenient way of preserving the form of the original message for future use? It seems that asking for the individual headers will yield the original formatting, and that the _headers attribute on Message instances provides the original headers in order, but I wonder if I'm missing a simple method to get things as they were received and not as the particular version of the module wants to format them.
Of course, RFC 3156 warns about the pitfalls of encoding the part that is to be signed, and I have been tempted to use a conservative approach where signed content is wrapped up inside an opaque message part with headers that are unlikely to be rewritten, but if I can manage to work with the email module rather than around it, I can hopefully steer clear of such inelegant measures.
Does anyone have any opinions or experience that they would like to share on the matter?
Paul