[Python-bugs-list] [ python-Bugs-417176 ] MultiFile.read() includes CRLF boundary

noreply@sourceforge.net noreply@sourceforge.net
Fri, 05 Oct 2001 13:59:19 -0700


Bugs item #417176, was opened at 2001-04-18 15:22
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=417176&group_id=5470

Category: Python Library
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Martijn Pieters (mjpieters)
Assigned to: Guido van Rossum (gvanrossum)
Summary: MultiFile.read() includes CRLF boundary

Initial Comment:
multifile.MultiFile.readlines()and .read() will return 
a body of a multipart message including the line 
delimiter that is to be regarded part of the boundary.

In a partial multipart message like:

--BoundaryHere
Content-Type: text/plain

1
2
3
4
--BoundaryHere

the message within the delimiters does not include the 
final line delimiter (CRLF or LF or whatnot) after the 
line reading '4'; it is considered part of the 
boundary. MultiFile however, returns it as part of the 
body.

See RFC2046 section 5.1.1. In the usual text 
formatting of the RFC, you'll find the definition and 
explanation in the first two paragraphs of page 19.


----------------------------------------------------------------------

>Comment By: Martijn Pieters (mjpieters)
Date: 2001-10-05 13:59

Message:
Logged In: YES 
user_id=116747

I just found again where I ran into this problem; in the
Zope HTTP Range header test suite. The code generates RFC
compliant multi-part mime responses and the test suite uses
MessageFile to see if the correct parts are returned.

See expectMultipleRanges in:

 
http://cvs.zope.org/Zope/lib/python/OFS/tests/testRanges.py?rev=1.3&content-type=text/vnd.viewcvs-markup

Right now there is code there that catches the extra that's
part of the boundary and strips this off; this fails with
Python 2.2a4 because now the \n is stripped but the \r is
still attached!

I am more and more convinced that MessageFile should not
expect that the line endings have been normalized to UNIX
only. Instead, it should handle at least the UNIX \n and the
RFC-compliant \r\n situations.


----------------------------------------------------------------------

Comment By: Martijn Pieters (mjpieters)
Date: 2001-09-18 09:16

Message:
Logged In: YES 
user_id=116747

Okay, if all the code depends on line-endings being
Unix-style, the patch has my blessings.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-18 08:40

Message:
Logged In: YES 
user_id=6380

I think that CRLF support in this case isn't worth it. It's
not done elsewhere in this module -- it assumes that line
endings have already been converted to Unix style. Lone CR
is definitely not supported -- none of the code would work.

----------------------------------------------------------------------

Comment By: Martijn Pieters (mjpieters)
Date: 2001-09-18 08:09

Message:
Logged In: YES 
user_id=116747

Your patch looks sound, apart from the fact it'll only
remove a LF. The Spec says the CRLF is part of the boundary,
and, to account for broken implementations, it should
probably remove and of 'CRLF', 'LF', or 'CR' at the end.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-18 07:34

Message:
Logged In: YES 
user_id=6380

I've checked in the patch now. Still waiting for Martijn's
feedback before I close the report.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-13 12:59

Message:
Logged In: YES 
user_id=6380

Martijn, here's a fix. Can you test this?

The fix works (how else) by reading ahead one line and
stripping the final newline if the next line is empty.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-05 10:54

Message:
Logged In: YES 
user_id=6380

I wrote that code and I'm probably culpable.  It's also
always bothered me.

Unassigning it from Barry (it has nothing to do with Barry).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=417176&group_id=5470