Mal Formed MIME post leaked through to list
I just had a problem on a Mailman 2.1.5 list although I think it's the Python email library - Python is 2.3.3.
A mal formed MIME message was posted to a list. The message was much larger than max_message_size yet it wasn't held, and several parts came through that weren't in pass_mime_types.
The basic post was multipart/mixed with a multipart/alternative sub part, a message/rfc822 sub part and the final text/plain msg_footer.
The message/rfc822 sub part was multipart/mixed with 3 subparts of type multipart/alternative, application/pdf and multipart/appledouble.
The problem with the MIME structure is that the boundary for the multipart/alternative and multipart/appledouble sub parts of the multipart/mixed message/rfc822 was identical to the boundary of the multipart/mixed part. I suspect the original message/rfc822 attached message was malformed, but it could have been broken in the attaching process. The original attached message was created by User-Agent: Microsoft-Outlook-Express-Macintosh-Edition/5.0.6 and the post was sent by Yahoo mail.
I suspect what happened is that the Python email library saw the end of part boundary for the second multipart/alternative part and treated it as the end of the message/rfc822 part since the boundary was the same.
Thus the big parts didn't get counted in the message size nor did they get filtered by content filtering.
Is this analysis correct? Is it fixed in later versions of the email library?
Here is an annotated copy of the received post with all content and non-relevant headers removed.
Received: from [63.201.34.79] by web81402.mail.yahoo.com via HTTP; Fri, 16 Sep 2005 16:32:37 PDT MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0-95218806-1126913557=:91474" Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 X-BeenThere: gpc-talk@grizz.org X-Mailman-Version: 2.1.5 -------------------above from the headers of the received post
--0-95218806-1126913557=:91474 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit ------------------above are first part headers - I think this was originally multipart/alternative and a text/html part was stripped
--0-95218806-1126913557=:91474 Content-Type: message/rfc822 ------------------part headers from the attached message/rfc822 part
User-Agent: Microsoft-Outlook-Express-Macintosh-Edition/5.0.6 Mime-version: 1.0 Content-type: multipart/mixed; boundary="MS_Mac_OE_3209732028_3749601_MIME_Part" Content-Length: 175299 ------------------from the headers of the attached message
--MS_Mac_OE_3209732028_3749601_MIME_Part Content-type: multipart/alternative; boundary="MS_Mac_OE_3209732028_3749601_MIME_Part" ---------------------part headers for multipart/alternative sub part of attached message. Note that the boundary is the same as that of the containing multipart/mixed part.
--MS_Mac_OE_3209732028_3749601_MIME_Part Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit -------------------------part headers for text/plain alternative
--MS_Mac_OE_3209732028_3749601_MIME_Part-- -----------------------end of multipart/alternative part but also looks like end of multipart/mixed part. I think a text/html part may have been stripped, but the multipart/alternative part isn't collapsed.
--MS_Mac_OE_3209732028_3749601_MIME_Part Content-type: application/pdf; name="Recruitment.pdf"; x-mac-creator="4341524F"; x-mac-type="50444620" Content-disposition: attachment Content-transfer-encoding: base64 -----------------------------------another sub part of multipart/mixed
- should have been filtered
--MS_Mac_OE_3209732028_3749601_MIME_Part Content-type: multipart/appledouble; boundary="MS_Mac_OE_3209732024_3737509_MIME_Part" -----------------------------------another sub part of multipart/mixed
- should have been filtered. Still same boundary
--MS_Mac_OE_3209732024_3737509_MIME_Part Content-type: application/applefile; name="High Sierra Trip 2003 Me" Content-transfer-encoding: base64 Content-disposition: attachment ----------------------sub part of multipart/appledouble should have been filtered
--MS_Mac_OE_3209732024_3737509_MIME_Part Content-type: image/jpeg; name="High Sierra Trip 2003 Me"; x-mac-creator="6F676C65"; x-mac-type="4A504547" Content-disposition: attachment Content-transfer-encoding: base64 ----------------------sub part of multipart/appledouble should have been filtered
--MS_Mac_OE_3209732024_3737509_MIME_Part-- ----------------------- end of multipart/appledouble
--MS_Mac_OE_3209732028_3749601_MIME_Part-- ------------------------ end of multipart/mixed
--0-95218806-1126913557=:91474 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------ part headers for list footer
--0-95218806-1126913557=:91474-- ------------------------ end of outermost message
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
At 6:03 PM -0700 2005-09-16, Mark Sapiro wrote:
I just had a problem on a Mailman 2.1.5 list although I think it's the Python email library - Python is 2.3.3.
The e-mail related library routines are known to have been
updated for Python 2.4, and there's been discussion of whether or not to require Python 2.4 for the next major release of Mailman (not sure if that's going to be 2.2 or 3.0).
I would be very interested to know how that would have dealt with
the problem you've had. Unfortunately, I don't have the skills or knowledge to help you answer that question.
-- Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
Brad Knowles wrote:
The e-mail related library routines are known to have been updated for Python 2.4, and there's been discussion of whether or not to require Python 2.4 for the next major release of Mailman (not sure if that's going to be 2.2 or 3.0).
I would be very interested to know how that would have dealt with the problem you've had. Unfortunately, I don't have the skills or knowledge to help you answer that question.
I've done a bit more research and I intend to continue looking into this, but I posted in the hope that Barry or someone else on the list might already have the answer. ans save me some trouble.
Additional things I've found are:
I confirmed that the post e-mail was clearly wrong. RFC2046 states "The boundary delimiter MUST NOT appear inside any of the encapsulated parts, on a line by itself or as the prefix of any line." That notwithstanding, I think the parser and Mailman should protect against this kind of error.
I looked a bit at the documentation of the email library and based on that, I think what may have happened is when the parser saw the first "end of subpart boundary" which looked the same as the outer "end of part boundary", it took it as the end of the outer part and treated the rest as an epilogue. Hold.py does the following to compute message size:
if mlist.max_message_size > 0:
bodylen = 0
for line in email.Iterators.body_line_iterator(msg):
bodylen += len(line)
but the body_line_iterator() method may skip the epilogue.
Also, I think MimeDel.py will leave the epilogue in the message.
Interestingly, both Thunderbird 1.5b1 and MS Outlook Express 6 seem to parse the message as intended, but Mutt 1.4.1i sees it more like Mailman does.
If there's no answer on the list, I intend to keep at it, but not for the next week as I will be away.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Hi,
Mark Sapiro wrote:
I looked a bit at the documentation of the email library and based on that, I think what may have happened is when the parser saw the first "end of subpart boundary" which looked the same as the outer "end of part boundary", it took it as the end of the outer part and treated the rest as an epilogue. Hold.py does the following to compute message size:
if mlist.max_message_size > 0: bodylen = 0 for line in email.Iterators.body_line_iterator(msg): bodylen += len(line)
but the body_line_iterator() method may skip the epilogue.
I think we can write more intuitive and robust code like this:
if mlist.max_message_size > 0:
bodylen = len(msg.as_string().split('\n\n',1)[1])
Also, I think MimeDel.py will leave the epilogue in the message.
I will look at it closer.
Interestingly, both Thunderbird 1.5b1 and MS Outlook Express 6 seem to parse the message as intended, but Mutt 1.4.1i sees it more like Mailman does.
If there's no answer on the list, I intend to keep at it, but not for the next week as I will be away.
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
On Sun, 2005-09-18 at 17:34, Tokio Kikuchi wrote:
I think we can write more intuitive and robust code like this:
if mlist.max_message_size > 0: bodylen = len(msg.as_string().split('\n\n',1)[1])
The only problem with that is that it's not very efficient to turn the message back into a flattened string in order to calculate its size.
Here's an idea: Python's FeedParser (or a subclass of that in Mailman) should keep a running count of the size of data fed to it, and it should add that to the message as an attribute. Even cooler would be if each subpart could have an accurate .size attribute added to it as it was being parsed.
Anybody care to work up a patch for that?
-Barry
On Sat, 2005-09-17 at 12:12, Mark Sapiro wrote:
I confirmed that the post e-mail was clearly wrong. RFC2046 states "The boundary delimiter MUST NOT appear inside any of the encapsulated parts, on a line by itself or as the prefix of any line." That notwithstanding, I think the parser and Mailman should protect against this kind of error.
The email 3.0 parser will be much more robust against these types of problems, but the best you can hope for is to take the defects into account when deciding what to do with the message. For example, I don't think it's unreasonable for Mailman to discard (or at least hold) messages with defects. IME, 99.99% of malformed messages are malware.
I looked a bit at the documentation of the email library and based on that, I think what may have happened is when the parser saw the first "end of subpart boundary" which looked the same as the outer "end of part boundary", it took it as the end of the outer part and treated the rest as an epilogue. Hold.py does the following to compute message size:
if mlist.max_message_size > 0: bodylen = 0 for line in email.Iterators.body_line_iterator(msg): bodylen += len(line)
but the body_line_iterator() method may skip the epilogue.
Yep. In private email, Mark pointed us to a patch for Hold.py which takes preambles and epilogues into account when calculating messages sizes. This has been applied to the 2.1 maintenance branch and the 2.2 trunk.
Also, I think MimeDel.py will leave the epilogue in the message.
I think that's generally appropriate (though I'm open to opinions while we're still on email 2.5). If we start discarding malformed messages, it probably makes sense to keep them, since it won't be possible to hide content there.
Interestingly, both Thunderbird 1.5b1 and MS Outlook Express 6 seem to parse the message as intended, but Mutt 1.4.1i sees it more like Mailman does.
Really, ultimately, it's up to the semantics of Python's email library. Changing that is not easy. ;)
Thanks! -Barry
participants (4)
-
Barry Warsaw
-
Brad Knowles
-
Mark Sapiro
-
Tokio Kikuchi