Writing byte stream as jpeg format to disk

Fri Aug 27 09:53:30 EDT 2010

Nobody wrote:
> Bryan wrote:
> > this is a case where we might want to be better
> > than correct. BaseHTTPRequestHandler in the Python standard library
> > accommodates clients that incorrectly omit the '\r' and end header lines
> > with just '\n'. Such apps have been seen in the wild. Since bare '\n'
> > never appears in correctly formed HTTP headers, interpreting it as
> > equivalent to '\r\n' doesn't break anything.
>
> Yes it does. It breaks upstream filtering rules which are intended to
> prohibit, remove or modify certain headers.
>
> This class of attack is known as "HTTP request smuggling". By
> appending a header preceded by a bare '\r' or '\n' to the end of
> another header, the header can be "smuggled" past a filter which
> parses headers using the correct syntax,

How does a bare '\r' or '\n' get past a filter which parses headers
using the correct syntax? I don't see where the correct syntax of the
HTTP protocol allows that.

> but will still be treated as a
> header by software which incorrectly parses headers using bare '\r' or
> '\n' as separators.

Why blame software that incorrectly accepts '\n' as a line break, and
not the filter that incorrectly accepted '\n' in the middle of a
header? Both are accepting incorrect syntax, but only the former has
good reason to do so.

> The safest solution would be to simply reject any request (or response)
> which contains bare '\r' or '\n' characters within headers, at least by
> default. Force the programmer to read the documentation (where the risks
> would be described) if they want the "fault tolerant" behaviour.

The Internet has a tradition of protocols above the transport level
being readable by eye and writable by hand. The result has been quick
development, but many mistakes that can induce unforeseen
consequences.

This case is somewhat subtle. Within a text entity-body, HTTP allows
any one of the three end-of-line delimiters. That's just the body; the
header portion is more rigid. In HTTP 1.0:

   "This flexibility regarding line breaks applies only to text
   media in the Entity-Body; a bare CR or LF should not be
   substituted for CRLF within any of the HTTP control
   structures (such as header fields and multipart boundaries)."
   -- RFC 1945

While in HTTP 1.1:

   "This flexibility regarding line breaks applies only to text
   media in the entity-body; a bare CR or LF MUST NOT be
   substituted for CRLF within any of the HTTP control
   structures (such as header fields and multipart boundaries)."
   -- RFC 2616

Note the change from "should not" to "MUST NOT". In reality our code
might be called upon to work with apps that botch the technically-
correct HTTP end-of-line marker. Rejecting bare '\n' may be safe from
a technical security perspective, but if our safe code breaks a
previously working system, then it will appear in a bug database and
not in production.

'Nobody' makes a fair point. I'd love to see Internet protocols
defined with mechanical rigor. Our discipline commonly specifies
programming language syntax formally, and Internet protocols are
syntactically simpler than programming languages. For now, HTTP is a
bit of a mess, so write it absolutely correctly but read it a bit
flexibly.