[Email-SIG] State of the FeedParser
barry at python.org
Thu May 13 18:03:45 EDT 2004
I feel like the new FeedParser is in a pretty good shape, but I wanted
to bring up two cases where what it parses is different than what
Anthony's MIME tests expect.
The two tests in question are test_multiple_same_boundary (in the email
test suite, msg_39.txt), and
By my interpretation of RFC 2046, I believe that if you encounter an
outer mutipart's boundary inside an inner part, you should treat that as
the inner part being truncated, with the boundary separating parts in
the outer multipart. This is implemented in the FeedParser as
BufferedSubFile.readline() testing all EOF predicates in its stack
against every line read.
Anthony's tests expect different behavior -- I believe it wants outer
boundaries in inner parts to be ignored. You can implement that in
.readline() by changing the line
for ateof in self._eofstack[::-1]:
for ateof in self._eofstack[-1::]:
Under the former, the above two tests are not parsed as Anthony's output
expects. Under the latter,
test_nested-multiples-with-internal-boundary-bastard gets parsed as
expected, but test_multiple_same_boundary still does not. For that
case, more complications will have to be added to the FeedParser.
I know Anthony will disagree with me, but I'm inclined to leave the
FeedParser as it now stands in CVS. I'm convinced it's closer to the
intent of the RFC. None of the data is lost, and the message's all get
.defects added to them, so you will at least /know/ something's wrong
If anybody is motivated to make the FeedParser agree with Anthony's
output, please generate a patch. I'd probably want to see some kind of
flag in the FeedParser that would get propagated to BufferedSubFile,
which switched between RFC-compliance mode and 'ignore-outer-boundaries'
mode. It's kind of distasteful to have such a flag, but I really don't
want to lose the current behavior. I also don't have much more stomach
for trying to add all that to the current FeedParser.
More information about the Email-SIG