PGP-signed message verification using the email module (and in Mailman)

Hello,
Sorry if this is too off-topic for the Mailman list and should really be directed to the email-sig list, but as I noticed that one of the GSOC projects touched upon message signing and verification, I thought that there might be more expertise on hand amongst this list's members, and there is a remote possibility that some of my difficulties can inform that work, too.
I've been looking into using the Python email module for signing and verifying messages according to RFC 3156, but one difficulty I encountered was where a message with the following form is received and parsed:
Content-Type: multipart/signed ...
-> Content-Type: ... ...
This is the content being signed (together with the part's headers).
-> Content-Type: application/pgp-signature ...
-----BEGIN PGP SIGNATURE----- ... -----END PGP SIGNATURE-----
Obviously, the email module permits the inspection of the message, and the apparent solution in verifying the message is to obtain the signed content along with the signature (using the get_payload method on the top-level multipart container) and then to pass them both to gpg for verification. Something like this...
content, signature = message.get_payload()[:2]
We might also choose to decode the signature, I imagine, since it may employ a transfer encoding:
signature = signature.get_payload(decode=True)
However, in an almost accidental discovery involving Python 2.5 and 2.7 being used for the respective signing and verification operations (and in reverse), what I found was that the email module would parse the signed content part but then, upon being asked to provide the string representation of the part, it would reformat the headers and thus cause the verification to fail.
So, where the email module in Python 2.5 likes to wrap headers using tab character indents, the module in Python 2.7 prefers to use a space for indentation instead. This means that the module reformats data upon being asked to provide a string representation of it rather than reporting exactly what it received.
The behaviour of the module is fairly understandable - no-one wants to keep the original data around in addition to having a more manageable parsed form of it - but is there a convenient way of preserving the form of the original message for future use? It seems that asking for the individual headers will yield the original formatting, and that the _headers attribute on Message instances provides the original headers in order, but I wonder if I'm missing a simple method to get things as they were received and not as the particular version of the module wants to format them.
Of course, RFC 3156 warns about the pitfalls of encoding the part that is to be signed, and I have been tempted to use a conservative approach where signed content is wrapped up inside an opaque message part with headers that are unlikely to be rewritten, but if I can manage to work with the email module rather than around it, I can hopefully steer clear of such inelegant measures.
Does anyone have any opinions or experience that they would like to share on the matter?
Paul

On 01/08/2014 12:35 PM, Paul Boddie wrote:
Of course, RFC 3156 warns about the pitfalls of encoding the part that is to be signed,
It doesn't just warn about the pitfalls. it states that:
Multipart/signed and multipart/encrypted are to be treated by agents as opaque, meaning that the data is not to be altered in any way [2], [7].
where [2] and [7] map roughly to:
[2] https://tools.ietf.org/html/rfc1847#section-2.1
which reads:
Security Considerations: [multipart/signed parts] Must be treated as opaque while in transit
and
[7] https://tools.ietf.org/html/rfc2480#section-4
which reads:
[email gateways] MUST provide the ability to tunnel multipart/signed and multipart/encrypted objects as monolithic entities if there is any chance whatsoever that MIME capabilities exist on the non-MIME side of the gateway. No changes to content of the multipart are permitted, even when the content is itself a composite MIME object.
so if python's email module really does mangle this part, it cannot be used within RFC-2480-compliant mail gateways. This is a bug in python's email module, and it needs to be fixed. Have you reported it to the python email module?
Thanks for raising the issue,
--dkg

On Wednesday 8. January 2014 19.21.12 Daniel Kahn Gillmor wrote:
I'd mostly assumed this because it obviously wouldn't do to just have the different parts modified at random. Thanks for making this clear, though!
Well, I reserve the right to be wrong about this, but it is certainly the case that calling as_string on a Message causes the message to be formatted anew.
Whether it is sensible that I use as_string at all is a reasonable thing to question - I steadily make new discoveries about the email API as time passes pass them to gpg. I'm not using the gnupg module that I think the GSOC work
- but in effect I'd like to extract the content and signature parts and then
uses, partly because I started out with my own wrapper, but the issue of extracting the content part as it was originally sent - and signed - is the critical factor here and probably outside the scope of the gnupg module anyway.
There's also the matter of whether any gateway would parse and serialise messages in the way I am attempting, but in principle I think that anyone using the email module to do so would need to do things the same way unless there's another way I'm not aware of. Again, I'd only be too happy for someone to tell me I'm doing things wrong. ;-)
I just searched for bugs reported about this and found the following:
http://bugs.python.org/issue1974 (covers the change from tab to space indents) http://bugs.python.org/issue1372770 (covers issues of header folding) http://bugs.python.org/issue11492 (also covers header folding) http://bugs.python.org/issue1440472 (actually mentions a lack of idempotency)
I think the last one probably answers my question, but I'll look at it again tomorrow. This may mean that I have to write my own message serialiser, of course.
Thanks for looking at this!
Paul

On Thursday 9. January 2014 01.49.14 Paul Boddie wrote:
Following up to myself, here's what I decided to do for now:
out = StringIO()
generator = Generator(out, False, 0) # disable reformatting measures
generator.flatten(message)
return out.getvalue()
Problems may apparently remain with superfluous whitespace found around header keys in any parsed messages, but the above seems to prevent the blatant rewriting I experienced with the Message.as_string method, doing so by overriding the Generator defaults.
Paul

On 01/08/2014 12:35 PM, Paul Boddie wrote:
Of course, RFC 3156 warns about the pitfalls of encoding the part that is to be signed,
It doesn't just warn about the pitfalls. it states that:
Multipart/signed and multipart/encrypted are to be treated by agents as opaque, meaning that the data is not to be altered in any way [2], [7].
where [2] and [7] map roughly to:
[2] https://tools.ietf.org/html/rfc1847#section-2.1
which reads:
Security Considerations: [multipart/signed parts] Must be treated as opaque while in transit
and
[7] https://tools.ietf.org/html/rfc2480#section-4
which reads:
[email gateways] MUST provide the ability to tunnel multipart/signed and multipart/encrypted objects as monolithic entities if there is any chance whatsoever that MIME capabilities exist on the non-MIME side of the gateway. No changes to content of the multipart are permitted, even when the content is itself a composite MIME object.
so if python's email module really does mangle this part, it cannot be used within RFC-2480-compliant mail gateways. This is a bug in python's email module, and it needs to be fixed. Have you reported it to the python email module?
Thanks for raising the issue,
--dkg

On Wednesday 8. January 2014 19.21.12 Daniel Kahn Gillmor wrote:
I'd mostly assumed this because it obviously wouldn't do to just have the different parts modified at random. Thanks for making this clear, though!
Well, I reserve the right to be wrong about this, but it is certainly the case that calling as_string on a Message causes the message to be formatted anew.
Whether it is sensible that I use as_string at all is a reasonable thing to question - I steadily make new discoveries about the email API as time passes pass them to gpg. I'm not using the gnupg module that I think the GSOC work
- but in effect I'd like to extract the content and signature parts and then
uses, partly because I started out with my own wrapper, but the issue of extracting the content part as it was originally sent - and signed - is the critical factor here and probably outside the scope of the gnupg module anyway.
There's also the matter of whether any gateway would parse and serialise messages in the way I am attempting, but in principle I think that anyone using the email module to do so would need to do things the same way unless there's another way I'm not aware of. Again, I'd only be too happy for someone to tell me I'm doing things wrong. ;-)
I just searched for bugs reported about this and found the following:
http://bugs.python.org/issue1974 (covers the change from tab to space indents) http://bugs.python.org/issue1372770 (covers issues of header folding) http://bugs.python.org/issue11492 (also covers header folding) http://bugs.python.org/issue1440472 (actually mentions a lack of idempotency)
I think the last one probably answers my question, but I'll look at it again tomorrow. This may mean that I have to write my own message serialiser, of course.
Thanks for looking at this!
Paul

On Thursday 9. January 2014 01.49.14 Paul Boddie wrote:
Following up to myself, here's what I decided to do for now:
out = StringIO()
generator = Generator(out, False, 0) # disable reformatting measures
generator.flatten(message)
return out.getvalue()
Problems may apparently remain with superfluous whitespace found around header keys in any parsed messages, but the above seems to prevent the blatant rewriting I experienced with the Message.as_string method, doing so by overriding the Generator defaults.
Paul
participants (3)
-
Daniel Kahn Gillmor
-
Paul Boddie
-
Stephen J. Turnbull