From fumanchu at amor.org  Fri Sep  8 09:33:24 2006
From: fumanchu at amor.org (Robert Brewer)
Date: Fri, 8 Sep 2006 00:33:24 -0700
Subject: [Web-SIG] WSGI type tolerance
Message-ID: <435DF58A933BA74397B42CDEB8145A86224C4D@ex9.hostedexchange.local>

See http://www.cherrypy.org/ticket/561. Apparently, "several WSGI apps" return an int for the Content-Length header. PEP 333 leaves just enough unsaid that someone might determine that's OK. But wsgiref and paste.lint certainly seem to think header values must be strings. Can we tighten up the language in the PEP on this point (or just agree on an interpretation here, so it's documented)?


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/a76b9b86/attachment.htm 

From pje at telecommunity.com  Fri Sep  8 17:56:38 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 08 Sep 2006 11:56:38 -0400
Subject: [Web-SIG] WSGI type tolerance
In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C4D@ex9.hostedexchange. local>
Message-ID: <5.1.1.6.0.20060908114854.0268fa48@sparrow.telecommunity.com>

At 12:33 AM 9/8/2006 -0700, Robert Brewer wrote:
>Content-class: urn:content-classes:message
>Content-Type: multipart/alternative;
>         boundary="----_=_NextPart_001_01C6D319.11101680"
>
>See 
><http://www.cherrypy.org/ticket/561>http://www.cherrypy.org/ticket/561. 
>Apparently, "several WSGI apps" return an int for the Content-Length 
>header. PEP 333 leaves just enough unsaid that someone might determine 
>that's OK. But wsgiref and paste.lint certainly seem to think header 
>values must be strings. Can we tighten up the language in the PEP on this 
>point (or just agree on an interpretation here, so it's documented)?

Those apps or servers are definitely broken; CGI environment variables are 
always strings.  I would urge you to have CherryPy reject servers or 
middleware providing integer content-length as broken, preferably with an 
error that indicates the application is not WSGI-compliant.

I'll change the phrasing of this:

"""The environ dictionary is required to contain these CGI environment 
variables, as defined by the Common Gateway Interface specification."""

to:

"""The environ dictionary is required to contain these CGI environment 
strings, as defined by the Common Gateway Interface specification."""

So that there is absolutely no room for ambiguity, although the CGI 
specification quite clearly calls for strings.  In the meantime, feel free 
to cite this message and point folks in the direction of wsgiref.validate 
(which is actually paste.lint under a different name).  It's also 
recommended that middleware authors test their middleware by using 
wsgiref.validate on *both* the server and application sides, not just one 
or the other.


From pje at telecommunity.com  Fri Sep  8 18:06:30 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 08 Sep 2006 12:06:30 -0400
Subject: [Web-SIG] WSGI type tolerance
In-Reply-To: <5.1.1.6.0.20060908114854.0268fa48@sparrow.telecommunity.co
 m>
References: <435DF58A933BA74397B42CDEB8145A86224C4D@ex9.hostedexchange. local>
Message-ID: <5.1.1.6.0.20060908120116.02cfee88@sparrow.telecommunity.com>

At 11:56 AM 9/8/2006 -0400, Phillip J. Eby wrote:
>I'll change the phrasing of this:
>
>"""The environ dictionary is required to contain these CGI environment
>variables, as defined by the Common Gateway Interface specification."""
>
>to:
>
>"""The environ dictionary is required to contain these CGI environment
>strings, as defined by the Common Gateway Interface specification."""

Oops.  I just noticed that the ticket was about response headers, not CGI 
variables.  I guess this is the part that needs to change:

"""The response_headers argument is a list of (header_name, header_value) 
tuples. It must be a Python list; i.e. type(response_headers) is ListType, 
and the server may change its contents in any way it desires. Each 
header_name must be a valid HTTP header field-name (as defined by RFC 2616, 
Section 4.2), without a trailing colon or other punctuation."""

I'll add:

"""Each ``header_name`` and ``header_value`` **must** be of StringType."""

Both changes are good, of course.

I think I should also add some language regarding wsgiref in the stdlib, 
the importance of using wsgiref.validate, and a recommendation that servers 
not be any more "liberal in what they accept" than what the spec allows.


From fumanchu at amor.org  Fri Sep  8 20:08:26 2006
From: fumanchu at amor.org (Robert Brewer)
Date: Fri, 8 Sep 2006 11:08:26 -0700
Subject: [Web-SIG] WSGI and readline(size) support (was: WSGI type tolerance)
References: <435DF58A933BA74397B42CDEB8145A86224C4D@ex9.hostedexchange. local>
	<5.1.1.6.0.20060908120116.02cfee88@sparrow.telecommunity.com>
Message-ID: <435DF58A933BA74397B42CDEB8145A86224C50@ex9.hostedexchange.local>

Phillip J. Eby wrote:
> Oops.  I just noticed that the ticket was about
> response headers, not CGI variables.
> ...
> I'll add:
> """Each ``header_name`` and ``header_value`` **must** be of StringType."""

Great! Thanks very much. I'll have CherryPy deny non-strings.

> I think I should also add some language regarding
> wsgiref in the stdlib, the importance of using
> wsgiref.validate, and a recommendation that servers
> not be any more "liberal in what they accept" than
> what the spec allows.

Thanks for the tip. I've just run CherryPy's test suite through the validator, and discovered a conflict that could bite a lot of people soon.

PEP 333 does not support the size argument to wsgi.input.readline(), stating that, "the optional "size" argument to readline() is not supported, as it may be complex for server authors to implement, and is not often used in practice."

However, Python 2.5rc1 has fixed a DoS bug in cgi.FieldStorage by using readline(1<<16). See http://sourceforge.net/tracker/?func=detail&aid=1112549&group_id=5470&atid=105470. CherryPy has had this patched for a year or so: http://www.cherrypy.org/ticket/127

Now that wsgiref is in the stdlib, we should really fix either it or the cgi module so that there's no conflict. It may be "complex for server authors to implement" readline(size) support, but it's even more complex for application authors to re-implement FieldStorage. ;) For what it's worth, there was zero work to have CherryPy support readline(size); it's automatically provided by socket.makefile.


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/dcd7328f/attachment.htm 

From ianb at colorstudy.com  Fri Sep  8 20:27:58 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri, 08 Sep 2006 13:27:58 -0500
Subject: [Web-SIG] WSGI: read method
Message-ID: <4501B62E.4050308@colorstudy.com>

An issue I just realized (as Robert was bringing up these things), is 
that I would like to be able to give input streams that don't have a 
known length.  In particular, I want to be able to do this:

input = environ['wsgi.input']
if hasattr(input, 'json_request'):
     request = input.json_request
else:
     if 'CONTENT_LENGTH' in environ:
         raw_request = input.read(int(environ['CONTENT_LENGTH']))
     else:
         raw_request = input.read()
     request = simplejson.loads(raw_request)


The idea is that the request body won't be serialized unless necessary, 
so internal requests (JSON, XMLRPC, etc) can avoid any serialization, 
while the WSGI app can deal with both these cases and normal 
string-based requests.  But I can't set CONTENT_LENGTH during these 
internal requests, because I'd need to figure out how long the 
serialized request body was, and that would require actually serializing it.

In the end it doesn't matter a whole lot, because almost no 
intermediaries every look at wsgi.input, though if WSGI apps expecting a 
JSON request but not aware of .json_request get one of these requests, 
it is likely they will fail.

Hmm... I could also set CONTENT_LENGTH='1', and make .read(1) return the 
actual entire body, totally ignoring the size argument.  Or make it 
'99999', or whatever.  That seems bad-clever, but maybe most workable 
with PEP 333?

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From fumanchu at amor.org  Fri Sep  8 20:46:52 2006
From: fumanchu at amor.org (Robert Brewer)
Date: Fri, 8 Sep 2006 11:46:52 -0700
Subject: [Web-SIG] WSGI: read method
References: <4501B62E.4050308@colorstudy.com>
Message-ID: <435DF58A933BA74397B42CDEB8145A86224C52@ex9.hostedexchange.local>

Ian Bicking wrote:
> An issue I just realized (as Robert was bringing up these things),
> is that I would like to be able to give input streams that don't
> have a known length.  In particular, I want to be able to do this:
> 
> input = environ['wsgi.input']
> if hasattr(input, 'json_request'):
>      request = input.json_request
> else:
>      if 'CONTENT_LENGTH' in environ:
>          raw_request = input.read(int(environ['CONTENT_LENGTH']))
>      else:
>          raw_request = input.read()
>      request = simplejson.loads(raw_request)
> 
> The idea is that the request body won't be serialized unless necessary,
> so internal requests (JSON, XMLRPC, etc) can avoid any serialization,
> while the WSGI app can deal with both these cases and normal 
> string-based requests.  But I can't set CONTENT_LENGTH during these 
> internal requests, because I'd need to figure out how long the 
> serialized request body was, and that would require actually serializing it.
> 
> In the end it doesn't matter a whole lot, because almost no 
> intermediaries every look at wsgi.input, though if WSGI apps expecting a 
> JSON request but not aware of .json_request get one of these requests, 
> it is likely they will fail.
> 
> Hmm... I could also set CONTENT_LENGTH='1', and make .read(1) return the 
> actual entire body, totally ignoring the size argument.  Or make it 
> '99999', or whatever.  That seems bad-clever, but maybe most workable 
> with PEP 333?

I'm getting lost on the phrase "internal request"--what do you mean by that?


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/3f43b134/attachment.htm 

From ianb at colorstudy.com  Fri Sep  8 20:55:48 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri, 08 Sep 2006 13:55:48 -0500
Subject: [Web-SIG] WSGI: read method
In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C52@ex9.hostedexchange.local>
References: <4501B62E.4050308@colorstudy.com>
	<435DF58A933BA74397B42CDEB8145A86224C52@ex9.hostedexchange.local>
Message-ID: <4501BCB4.4040609@colorstudy.com>

Robert Brewer wrote:
>  > Hmm... I could also set CONTENT_LENGTH='1', and make .read(1) return the
>  > actual entire body, totally ignoring the size argument.  Or make it
>  > '99999', or whatever.  That seems bad-clever, but maybe most workable
>  > with PEP 333?
> 
> I'm getting lost on the phrase "internal request"--what do you mean by that?

The WSGI request originates from Python, not from an external request. 
So, the WSGI environment is created from Python and sent to the 
application without any HTTP server.

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From fumanchu at amor.org  Fri Sep  8 23:02:28 2006
From: fumanchu at amor.org (Robert Brewer)
Date: Fri, 8 Sep 2006 14:02:28 -0700
Subject: [Web-SIG] WSGI and long response header values
Message-ID: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local>

PEP 333 says:

"Each header_value must not include any control characters, including carriage returns or linefeeds, either embedded or at the end. (These requirements are to minimize the complexity of any parsing that must be performed by servers, gateways, and intermediate response processors that need to inspect or modify response headers.)" [1]

That's understandable, but HTTP headers are defined as (mostly) *TEXT, and "words of *TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047." [2] And RFC 2047 specifies that "an 'encoded-word' may not be more than 75 characters long...If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used." [3] This satisfies HTTP header folding rules, as well: "Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT." [1, again]

So in my reading of HTTP, some code somewhere should introduce newlines in longish, encoded response header values. I see three options:

 1. Keep things as they are and disallow response header values if they contain words over 75 chars that are outside the ISO-8859-1 character set
 2. Allow newline characters in WSGI response headers
 3. Require/strongly suggest WSGI servers to do the encoding and folding before sending the value over HTTP.

Any other solutions? I'd like to see 2 or 3 adopted (unless something better comes along), so CherryPy can continue to support as much of the HTTP spec as possible.


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2
[3] http://www.rfc.net/rfc2047.html#s2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/4f68771c/attachment-0001.htm 

From pje at telecommunity.com  Fri Sep  8 23:55:02 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 08 Sep 2006 17:55:02 -0400
Subject: [Web-SIG] WSGI and long response header values
In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange. local>
Message-ID: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com>

At 02:02 PM 9/8/2006 -0700, Robert Brewer wrote:

>PEP 333 says:
>
>"Each header_value must not include any control characters, including 
>carriage returns or linefeeds, either embedded or at the end. (These 
>requirements are to minimize the complexity of any parsing that must be 
>performed by servers, gateways, and intermediate response processors that 
>need to inspect or modify response headers.)" [1]
>
>That's understandable, but HTTP headers are defined as (mostly) *TEXT, and 
>"words of *TEXT MAY contain characters from character sets other than 
>ISO-8859-1 only when encoded according to the rules of RFC 2047." [2] And 
>RFC 2047 specifies that "an 'encoded-word' may not be more than 75 
>characters long...If it is desirable to encode more text than will fit in 
>an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by 
>CRLF SPACE) may be used." [3] This satisfies HTTP header folding rules, as 
>well: "Header fields can be extended over multiple lines by preceding each 
>extra line with at least one SP or HT." [1, again]
>
>So in my reading of HTTP, some code somewhere should introduce newlines in 
>longish, encoded response header values. I see three options:
>
>  1. Keep things as they are and disallow response header values if they 
> contain words over 75 chars that are outside the ISO-8859-1 character set
>  2. Allow newline characters in WSGI response headers
>  3. Require/strongly suggest WSGI servers to do the encoding and folding 
> before sending the value over HTTP.
>
>Any other solutions? I'd like to see 2 or 3 adopted (unless something 
>better comes along), so CherryPy can continue to support as much of the 
>HTTP spec as possible.

#3 sounds most attractive, although I must confess I don't see how it could 
be made to work, since the strings have to be encoded already, unless 
you're saying that applications should encode them in chunks of up to 75 
characters, separated by spaces, and that the servers should then fold the 
result.  That would certainly seem like the least-intrusive way to deal 
with it, as a slight clarification to the spec, rather than any real 
*change* to the spec (and hence a new version of it) as #2 would require.


From fumanchu at amor.org  Sat Sep  9 01:22:22 2006
From: fumanchu at amor.org (Robert Brewer)
Date: Fri, 8 Sep 2006 16:22:22 -0700
Subject: [Web-SIG] WSGI and long response header values
References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com>
Message-ID: <435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local>

Phillip J. Eby wrote:
> Robert Brewer wrote:
> >So in my reading of HTTP, some code somewhere should introduce newlines in 
> >longish, encoded response header values. I see three options:
> >
> >  1. Keep things as they are and disallow response header values if they 
> > contain words over 75 chars that are outside the ISO-8859-1 character set
> >  2. Allow newline characters in WSGI response headers
> >  3. Require/strongly suggest WSGI servers to do the encoding and folding 
> > before sending the value over HTTP.
> >
> >Any other solutions? I'd like to see 2 or 3 adopted (unless something 
> >better comes along), so CherryPy can continue to support as much of the 
> >HTTP spec as possible.
> 
> #3 sounds most attractive, although I must confess I don't see how it
> could be made to work, since the strings have to be encoded already,
> unless you're saying that applications should encode them in chunks
> of up to 75 characters, separated by spaces, and that the servers
> should then fold the result.  That would certainly seem like the
> least-intrusive way to deal with it, as a slight clarification to
> the spec, rather than any real *change* to the spec (and hence a new
> version of it) as #2 would require.

Bah. I knew I forgot a constraint in there (the strings have to be encoded by the app). Personally, I think the "separate-by-spaces" cure is worse than the disease. I also finally found the only other discussion of this issue [1] and ... I wish we had allowed folding from the beginning. Given the obscure nature of this need, I would rather have had all WSGI implementations be 99% WSGI-compliant (by ignoring folding) than 99% HTTP-compliant (by not allowing folding). We could have improved the former number without changing the spec, but not the latter. Meh. Water under the bridge. Maybe in 1.1?


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

[1] http://mail.python.org/pipermail/web-sig/2004-September/000749.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/42955f74/attachment.html 

From ianb at colorstudy.com  Sat Sep  9 01:31:55 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri, 08 Sep 2006 18:31:55 -0500
Subject: [Web-SIG] WSGI and long response header values
In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local>
References: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local>
Message-ID: <4501FD6B.5060002@colorstudy.com>

Robert Brewer wrote:
> PEP 333 says:
> 
> "Each header_value must not include any control characters, including 
> carriage returns or linefeeds, either embedded or at the end. (These 
> requirements are to minimize the complexity of any parsing that must be 
> performed by servers, gateways, and intermediate response processors 
> that need to inspect or modify response headers.)" [1]
> 
> That's understandable, but HTTP headers are defined as (mostly) *TEXT, 
> and "words of *TEXT MAY contain characters from character sets other 
> than ISO-8859-1 only when encoded according to the rules of RFC 2047." 
> [2] And RFC 2047 specifies that "an 'encoded-word' may not be more than 
> 75 characters long...If it is desirable to encode more text than will 
> fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's 
> (separated by CRLF SPACE) may be used." [3] This satisfies HTTP header 
> folding rules, as well: "Header fields can be extended over multiple 
> lines by preceding each extra line with at least one SP or HT." [1, again]
> 
> So in my reading of HTTP, some code somewhere should introduce newlines 
> in longish, encoded response header values. 

Realistically, isn't this an artifact of a time when things like 
line-length mattered a lot more?  That is, does any HTTP client actually 
care about or rely on the 75 character limit?

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From pje at telecommunity.com  Sat Sep  9 02:37:13 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 08 Sep 2006 20:37:13 -0400
Subject: [Web-SIG] WSGI and long response header values
In-Reply-To: <4501FD6B.5060002@colorstudy.com>
References: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local>
	<435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local>
Message-ID: <5.1.1.6.0.20060908203621.0268dac8@sparrow.telecommunity.com>

At 06:31 PM 9/8/2006 -0500, Ian Bicking wrote:
>Robert Brewer wrote:
> > PEP 333 says:
> >
> > "Each header_value must not include any control characters, including
> > carriage returns or linefeeds, either embedded or at the end. (These
> > requirements are to minimize the complexity of any parsing that must be
> > performed by servers, gateways, and intermediate response processors
> > that need to inspect or modify response headers.)" [1]
> >
> > That's understandable, but HTTP headers are defined as (mostly) *TEXT,
> > and "words of *TEXT MAY contain characters from character sets other
> > than ISO-8859-1 only when encoded according to the rules of RFC 2047."
> > [2] And RFC 2047 specifies that "an 'encoded-word' may not be more than
> > 75 characters long...If it is desirable to encode more text than will
> > fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's
> > (separated by CRLF SPACE) may be used." [3] This satisfies HTTP header
> > folding rules, as well: "Header fields can be extended over multiple
> > lines by preceding each extra line with at least one SP or HT." [1, again]
> >
> > So in my reading of HTTP, some code somewhere should introduce newlines
> > in longish, encoded response header values.
>
>Realistically, isn't this an artifact of a time when things like
>line-length mattered a lot more?  That is, does any HTTP client actually
>care about or rely on the 75 character limit?

I believe RFC 2047 was originally created for email, where such a limit 
would actually matter.


From foom at fuhm.net  Sat Sep  9 04:10:46 2006
From: foom at fuhm.net (James Y Knight)
Date: Fri, 8 Sep 2006 22:10:46 -0400
Subject: [Web-SIG] WSGI and long response header values
In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local>
References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com>
	<435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local>
Message-ID: <517C7C04-0009-4227-BA72-B45DF30A27BA@fuhm.net>


On Sep 8, 2006, at 7:22 PM, Robert Brewer wrote:
> Bah. I knew I forgot a constraint in there (the strings have to be  
> encoded by the app). Personally, I think the "separate-by-spaces"  
> cure is worse than the disease. I also finally found the only other  
> discussion of this issue [1] and ... I wish we had allowed folding  
> from the beginning. Given the obscure nature of this need, I would  
> rather have had all WSGI implementations be 99% WSGI-compliant (by  
> ignoring folding) than 99% HTTP-compliant (by not allowing  
> folding). We could have improved the former number without changing  
> the spec, but not the latter. Meh. Water under the bridge. Maybe in  
> 1.1?
I don't see what's wrong with encoding with the 75-char word-limit,  
separating "words" by spaces, *without* newlines. If the server feels  
like folding a long line into two, it can do so, but it's perfectly  
within its rights not to, and AFAIK nothing at all requires it to  
ever fold, given that a folded line is exactly equivalent to a single  
space. Line folding is one of those things that really has no purpose  
in HTTP besides to write out the examples in the RFCs.

James


From fumanchu at amor.org  Sat Sep  9 07:45:21 2006
From: fumanchu at amor.org (Robert Brewer)
Date: Fri, 8 Sep 2006 22:45:21 -0700
Subject: [Web-SIG] WSGI and long response header values
References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com>
	<435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local>
	<517C7C04-0009-4227-BA72-B45DF30A27BA@fuhm.net>
Message-ID: <435DF58A933BA74397B42CDEB8145A86224C55@ex9.hostedexchange.local>

James Y Knight wrote:
> On Sep 8, 2006, at 7:22 PM, Robert Brewer wrote:
> > Bah. I knew I forgot a constraint in there (the strings
> > have to be encoded by the app). Personally, I think the
> > "separate-by-spaces" cure is worse than the disease.
> > I also finally found the only other discussion of this
> > issue [1] and ... I wish we had allowed folding from
> > the beginning. Given the obscure nature of this need,
> > I would rather have had all WSGI implementations be
> > 99% WSGI-compliant (by ignoring folding) than 99%
> > HTTP-compliant (by not allowing folding). We could
> > have improved the former number without changing the
> > spec, but not the latter. Meh. Water under the bridge.
> > Maybe in 1.1?
> 
> I don't see what's wrong with encoding with the 75-char
> word-limit, separating "words" by spaces, *without* newlines.
> If the server feels like folding a long line into two, it
> can do so, but it's perfectly within its rights not to,
> and AFAIK nothing at all requires it to ever fold, given
> that a folded line is exactly equivalent to a single space.
> Line folding is one of those things that really has no purpose
> in HTTP besides to write out the examples in the RFCs.

I was hoping that too, but the server is actually *not* within its rights to leave out the newlines, because that restriction is actually part of RFC 2047 (MIME headers), not the HTTP spec:

"If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used."


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/3d18afc8/attachment.htm 

From fumanchu at amor.org  Sat Sep  9 07:51:49 2006
From: fumanchu at amor.org (Robert Brewer)
Date: Fri, 8 Sep 2006 22:51:49 -0700
Subject: [Web-SIG] WSGI and long response header values
References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com><435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local><517C7C04-0009-4227-BA72-B45DF30A27BA@fuhm.net>
	<435DF58A933BA74397B42CDEB8145A86224C55@ex9.hostedexchange.local>
Message-ID: <435DF58A933BA74397B42CDEB8145A86224C56@ex9.hostedexchange.local>

James Y Knight wrote:
> I don't see what's wrong with encoding with the 75-char
> word-limit, separating "words" by spaces, *without* newlines.
> If the server feels like folding a long line into two, it
> can do so, but it's perfectly within its rights not to,
> and AFAIK nothing at all requires it to ever fold, given
> that a folded line is exactly equivalent to a single space.
> Line folding is one of those things that really has no purpose
> in HTTP besides to write out the examples in the RFCs.

And I just said:
> I was hoping that too, but the server is actually *not*
> within its rights to leave out the newlines, because that
> restriction is actually part of RFC 2047 (MIME headers),
> not the HTTP spec.

Bah. Of course, any HTTP server or proxy is free to unfold headers. So maybe the dream of arbitrary header values via MIME-encoding is broken from the get-go.


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/e530b86f/attachment.html 

From foom at fuhm.net  Sun Sep 10 01:49:33 2006
From: foom at fuhm.net (James Y Knight)
Date: Sat, 9 Sep 2006 19:49:33 -0400
Subject: [Web-SIG] WSGI and long response header values
In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C56@ex9.hostedexchange.local>
References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com><435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local><517C7C04-0009-4227-BA72-B45DF30A27BA@fuhm.net>
	<435DF58A933BA74397B42CDEB8145A86224C55@ex9.hostedexchange.local>
	<435DF58A933BA74397B42CDEB8145A86224C56@ex9.hostedexchange.local>
Message-ID: <EEE0C0D9-E32D-4FC9-9FA5-CE5F3D7D5F2C@fuhm.net>


On Sep 9, 2006, at 1:51 AM, Robert Brewer wrote:

> James Y Knight wrote:
> > I don't see what's wrong with encoding with the 75-char
> > word-limit, separating "words" by spaces, *without* newlines.
> > If the server feels like folding a long line into two, it
> > can do so, but it's perfectly within its rights not to,
> > and AFAIK nothing at all requires it to ever fold, given
> > that a folded line is exactly equivalent to a single space.
> > Line folding is one of those things that really has no purpose
> > in HTTP besides to write out the examples in the RFCs.
>
> And I just said:
> > I was hoping that too, but the server is actually *not*
> > within its rights to leave out the newlines, because that
> > restriction is actually part of RFC 2047 (MIME headers),
> > not the HTTP spec.
>
> Bah. Of course, any HTTP server or proxy is free to unfold headers.  
> So maybe the dream of arbitrary header values via MIME-encoding is  
> broken from the get-go.
No, it's just an inconsistency in the RFC. I suggest reading the  
RFC2616 as having precedence over the requirements in RFC2047, and  
thus the line breaks are not required. I seriously doubt if anything  
will malfunction if given such input. Not that I know of any actual  
use cases of non-ASCII character encoding in http headers, anyhow.

James


From mnot at mnot.net  Sun Sep 10 02:18:46 2006
From: mnot at mnot.net (Mark Nottingham)
Date: Sat, 9 Sep 2006 17:18:46 -0700
Subject: [Web-SIG] WSGI and long response header values
In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local>
References: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local>
Message-ID: <8F965D17-4410-499B-824B-6DEB2F29A2A6@mnot.net>

My reading o WSGI was that multi-line headers should already be  
folding multi-line headers. If that's the case, what's the problem?

Cheers,


On 2006/09/08, at 2:02 PM, Robert Brewer wrote:

> PEP 333 says:
>
> "Each header_value must not include any control characters,  
> including carriage returns or linefeeds, either embedded or at the  
> end. (These requirements are to minimize the complexity of any  
> parsing that must be performed by servers, gateways, and  
> intermediate response processors that need to inspect or modify  
> response headers.)" [1]
>
> That's understandable, but HTTP headers are defined as (mostly)  
> *TEXT, and "words of *TEXT MAY contain characters from character  
> sets other than ISO-8859-1 only when encoded according to the rules  
> of RFC 2047." [2] And RFC 2047 specifies that "an 'encoded-word'  
> may not be more than 75 characters long...If it is desirable to  
> encode more text than will fit in an 'encoded-word' of 75  
> characters, multiple 'encoded-word's (separated by CRLF SPACE) may  
> be used." [3] This satisfies HTTP header folding rules, as well:  
> "Header fields can be extended over multiple lines by preceding  
> each extra line with at least one SP or HT." [1, again]
>
> So in my reading of HTTP, some code somewhere should introduce  
> newlines in longish, encoded response header values. I see three  
> options:
>
>  1. Keep things as they are and disallow response header values if  
> they contain words over 75 chars that are outside the ISO-8859-1  
> character set
>  2. Allow newline characters in WSGI response headers
>  3. Require/strongly suggest WSGI servers to do the encoding and  
> folding before sending the value over HTTP.
>
> Any other solutions? I'd like to see 2 or 3 adopted (unless  
> something better comes along), so CherryPy can continue to support  
> as much of the HTTP spec as possible.
>
>
> Robert Brewer
> System Architect
> Amor Ministries
> fumanchu at amor.org
>
> [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
> [2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2
> [3] http://www.rfc.net/rfc2047.html#s2
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot% 
> 40mnot.net


--
Mark Nottingham     http://www.mnot.net/


From mnot at mnot.net  Sun Sep 10 19:25:12 2006
From: mnot at mnot.net (Mark Nottingham)
Date: Sun, 10 Sep 2006 10:25:12 -0700
Subject: [Web-SIG] WSGI and long response header values
In-Reply-To: <8F965D17-4410-499B-824B-6DEB2F29A2A6@mnot.net>
References: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local>
	<8F965D17-4410-499B-824B-6DEB2F29A2A6@mnot.net>
Message-ID: <9AAE585A-9CE3-451D-B17A-AA6FCB6ED146@mnot.net>

Now in English:

My reading of WSGI was that implementations should already be folding  
multi-line headers.

Cheers,


On 2006/09/09, at 5:18 PM, Mark Nottingham wrote:

> My reading o WSGI was that multi-line headers should already be
> folding multi-line headers. If that's the case, what's the problem?
>
> Cheers,
>
>
> On 2006/09/08, at 2:02 PM, Robert Brewer wrote:
>
>> PEP 333 says:
>>
>> "Each header_value must not include any control characters,
>> including carriage returns or linefeeds, either embedded or at the
>> end. (These requirements are to minimize the complexity of any
>> parsing that must be performed by servers, gateways, and
>> intermediate response processors that need to inspect or modify
>> response headers.)" [1]
>>
>> That's understandable, but HTTP headers are defined as (mostly)
>> *TEXT, and "words of *TEXT MAY contain characters from character
>> sets other than ISO-8859-1 only when encoded according to the rules
>> of RFC 2047." [2] And RFC 2047 specifies that "an 'encoded-word'
>> may not be more than 75 characters long...If it is desirable to
>> encode more text than will fit in an 'encoded-word' of 75
>> characters, multiple 'encoded-word's (separated by CRLF SPACE) may
>> be used." [3] This satisfies HTTP header folding rules, as well:
>> "Header fields can be extended over multiple lines by preceding
>> each extra line with at least one SP or HT." [1, again]
>>
>> So in my reading of HTTP, some code somewhere should introduce
>> newlines in longish, encoded response header values. I see three
>> options:
>>
>>  1. Keep things as they are and disallow response header values if
>> they contain words over 75 chars that are outside the ISO-8859-1
>> character set
>>  2. Allow newline characters in WSGI response headers
>>  3. Require/strongly suggest WSGI servers to do the encoding and
>> folding before sending the value over HTTP.
>>
>> Any other solutions? I'd like to see 2 or 3 adopted (unless
>> something better comes along), so CherryPy can continue to support
>> as much of the HTTP spec as possible.
>>
>>
>> Robert Brewer
>> System Architect
>> Amor Ministries
>> fumanchu at amor.org
>>
>> [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
>> [2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2
>> [3] http://www.rfc.net/rfc2047.html#s2
>>
>> _______________________________________________
>> Web-SIG mailing list
>> Web-SIG at python.org
>> Web SIG: http://www.python.org/sigs/web-sig
>> Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot%
>> 40mnot.net
>
>
> --
> Mark Nottingham     http://www.mnot.net/
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot% 
> 40mnot.net


--
Mark Nottingham     http://www.mnot.net/


From davidf at sjsoft.com  Tue Sep 12 11:12:51 2006
From: davidf at sjsoft.com (David Fraser)
Date: Tue, 12 Sep 2006 11:12:51 +0200
Subject: [Web-SIG] Middleware stack construction
Message-ID: <45067A13.8080306@sjsoft.com>

Hi

We've been trying to manage how we handle the middleware stack in our
web framework. The PEP doesn't specify any standard way of doing this
and the example is constructed with the next item in the stack as a
parameter.

Our approach is to pass a WSGIStack variable in the environment
variables and get each layer of middleware to pop off the next layer and
call it, thus:

def run_child(self, environ, startresponse):
    child = environ["j5_WSGIStack"].pop()
    return child(environ, startresponse)

so that the middleware can transform that in whichever way it wants...

Does this fit in well with how other people are doing things? Just curious

David

From pje at telecommunity.com  Tue Sep 12 17:19:05 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 12 Sep 2006 11:19:05 -0400
Subject: [Web-SIG] Middleware stack construction
In-Reply-To: <45067A13.8080306@sjsoft.com>
Message-ID: <5.1.1.6.0.20060912111415.026e98a8@sparrow.telecommunity.com>

At 11:12 AM 9/12/2006 +0200, David Fraser wrote:
>Hi
>
>We've been trying to manage how we handle the middleware stack in our
>web framework. The PEP doesn't specify any standard way of doing this
>and the example is constructed with the next item in the stack as a
>parameter.
>
>Our approach is to pass a WSGIStack variable in the environment
>variables and get each layer of middleware to pop off the next layer and
>call it, thus:
>
>def run_child(self, environ, startresponse):
>     child = environ["j5_WSGIStack"].pop()
>     return child(environ, startresponse)
>
>so that the middleware can transform that in whichever way it wants...
>
>Does this fit in well with how other people are doing things? Just curious

That's an interesting concept.  I don't think anybody else has come up with 
it though.

I'll certainly steal it if I ever get around to creating a competitor to 
Paste Deploy.  :)  But I imagine I'd use a linked list of tuples instead of 
popping from a list, e.g.:

      child, environ["my.middleware.stack"] = environ["my.middleware.stack"]

This would allow the same chain to be used for every call, without copying, 
and it's probably faster than pop() as well, at least under CPython.


From renesd at gmail.com  Fri Sep 15 10:29:33 2006
From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=)
Date: Fri, 15 Sep 2006 18:29:33 +1000
Subject: [Web-SIG] Python pickle and web security.
Message-ID: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>

Hello,

I posted this on my blog the other day about people using pickle for
sessions, but got no response.  Do you guys think using pickles for
sessions is an ok thing to do?


...........

Some python web frame works are using pickle to store session data.
Pickle is a well known poor choice for secure systems. However it
seems to be more widely known by those writing network applications,
than those making web frameworks.

Is your web framework using pickle for sessions despite the warnings
in the python documentation about it being insecure?

By using sessions with pickle people who can write to the database
servers session table can execute code on the app server. Or people
who can get data into the session file/memcache data store can execute
data.

This might be an issue if the database server is run by separate
people than the app server. Or if the session table is compromised by
an sql injection attack elsewhere.

There are some more secure ways of storing pickled data.

Pickle is deemed to be untrustworthy for data. In that it is not
certain that code can not be snuck into the data that will be executed
by pickle. So if some data from user input is put into the pickle,
then it is possible that code could be run.

There are some people who know more about how to exploit pickle,
however the warning in the python documentation is this:

""Warning:
The pickle module is not intended to be secure against erroneous or
maliciously constructed data. Never unpickle data received from an
untrusted or unauthenticated source."""


Cerealizer might be an alternative option...
http://home.gna.org/oomadness/en/cerealizer/index.html

Or maybe these other two.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/415503
http://barnesc.blogspot.com/2006/01/rencode-reduced-length-encodings.html

From jim at zope.com  Fri Sep 15 12:29:41 2006
From: jim at zope.com (Jim Fulton)
Date: Fri, 15 Sep 2006 06:29:41 -0400
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
Message-ID: <6CC72EAE-39FE-4685-B0DB-1B0C8C0D0E49@zope.com>


On Sep 15, 2006, at 4:29 AM, Ren? Dudfield wrote:

> Hello,
>
> I posted this on my blog the other day about people using pickle for
> sessions, but got no response.  Do you guys think using pickles for
> sessions is an ok thing to do?

You don't want to accept pickles from an untrusted source, which  
typically means you don't want to accept pickles over the network.   
Even then, there are ways to use pickles securely. For example, you  
can, if you know what you're doing, arrange to prevent pickle from  
calling global objects or control specifically what global objects  
are callable.

There is nothing wrong with using pickles to store data internally.   
As long as the pickles are generated by the application, there is no  
risk to the application reading them again, assuming that they are  
stored where they can't be tampered with.

Saying pickle is inherently insecure is like saying Python is  
inherently insecure.  You don't want to execute Python from an  
untrusted source.  If someone can tamper with your Python code, then  
you have a serious security problem as well.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org


From python at venix.com  Fri Sep 15 15:40:31 2006
From: python at venix.com (Python)
Date: Fri, 15 Sep 2006 09:40:31 -0400
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
Message-ID: <1158327631.9975.116.camel@www.venix.com>

On Fri, 2006-09-15 at 18:29 +1000, Ren? Dudfield wrote:
> Hello,
> 
> I posted this on my blog the other day about people using pickle for
> sessions, but got no response.  Do you guys think using pickles for
> sessions is an ok thing to do?

Either encrypt the pickle or have a seeded (md5) signature so that you
can verify that the pickle has not been tampered.  I use pickles
routinely, but with an md5 signature that combines a seed and the
pickle.

Someone cannot generate a valid signature without also knowing the seed.
I am paranoid enough so that I only pickle dictionaries and then only
extract and verify my list of expected keys after unpickling.  I can't
prove that's secure, but I am not losing sleep over it.  

Presumably someone who knew the seed could generate a valid signature
*and* inject code into the pickle that got executed by the unpickle
operation.

> 
> 
> 
> 
> ...........
> 
> Some python web frame works are using pickle to store session data.
> Pickle is a well known poor choice for secure systems. However it
> seems to be more widely known by those writing network applications,
> than those making web frameworks.
> 
> Is your web framework using pickle for sessions despite the warnings
> in the python documentation about it being insecure?
> 
> By using sessions with pickle people who can write to the database
> servers session table can execute code on the app server. Or people
> who can get data into the session file/memcache data store can execute
> data.
> 
> This might be an issue if the database server is run by separate
> people than the app server. Or if the session table is compromised by
> an sql injection attack elsewhere.
> 
> There are some more secure ways of storing pickled data.
> 
> Pickle is deemed to be untrustworthy for data. In that it is not
> certain that code can not be snuck into the data that will be executed
> by pickle. So if some data from user input is put into the pickle,
> then it is possible that code could be run.
> 
> There are some people who know more about how to exploit pickle,
> however the warning in the python documentation is this:
> 
> ""Warning:
> The pickle module is not intended to be secure against erroneous or
> maliciously constructed data. Never unpickle data received from an
> untrusted or unauthenticated source."""
> 
> 
> Cerealizer might be an alternative option...
> http://home.gna.org/oomadness/en/cerealizer/index.html
> 
> Or maybe these other two.
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/415503
> http://barnesc.blogspot.com/2006/01/rencode-reduced-length-encodings.html
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/python%40venix.com
-- 
Lloyd Kvam
Venix Corp


From renesd at gmail.com  Sat Sep 16 04:07:01 2006
From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=)
Date: Sat, 16 Sep 2006 12:07:01 +1000
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <6CC72EAE-39FE-4685-B0DB-1B0C8C0D0E49@zope.com>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
	<6CC72EAE-39FE-4685-B0DB-1B0C8C0D0E49@zope.com>
Message-ID: <64ddb72c0609151907g20e1afb9xeed6660fc1902b9d@mail.gmail.com>

Hi,

I think my main point was about using pickle for sessions, not just
using pickle by itself.

Unlike loading other data, code gets run when you load a pickle.  It
is indeed like running python code.  So if you do not trust where you
store your pickles to run python code, then that is a problem.

If the unpickle or pickle code is not bug free, then you can not trust
that unpickling a pickle will not allow data to be made which can
trick the unpickle escaping code.

With the history of bugs with the unpickle code, I don't think relying
on it is a good idea.

For a list of pickle bugs you can search the python bug tracker.
There are over 70 bugs listed including the open, closed, and deleted
bugs.  With 13 open bugs listed.

One of the bugs was closed because: 'Closing due to lack of response.
cPickle is such a complex module, without a test case the leak cannot
be found.'

I think that line says best about how much you should trust the C
module pickle code that is 5753 lines long, and has not been audited.

Will pickle *always* escape data you pass it correctly when it encodes
it into a pickle?  Will unpickle *always* unescape parts of the pickle
correctly?  If not then those pickles can run code.

The risk of using pickle does not seem to be worth the convenience
that it gives.  With alternatives to pickle which do not execute code
being available why not use them?

By using pickle for session data you allow people the oportunity to
put data into the pickle.  For example say you store a given GET
variable in the session.

Combining that you allow people with pickle-sessions to put data into
the pickle, and the risk that pickle might not encode/decode it
correctly is the problem I see.

However if allowing untrusted data to be placed into a pickle is ok,
then this is not a problem.  That only leaves the problem of allowing
the data store of your sessions to be able to execute code where you
load sessions.

This means you allow execution of code from your data store to your
session loading code.  Which means if you use a separate database
machine(quite common), or if you use a separate memcache server(not
unheard of) you allow these machines to execute code on the session
using machine.

There's a reason why people use separate user accounts, and separate
machines for doing different tasks.  That reason is to limit what each
user or machine can do.  By using pickles for sessions those benefits
are removed in some cases.

Cheers,

On 9/15/06, Jim Fulton <jim at zope.com> wrote:
>
> On Sep 15, 2006, at 4:29 AM, Ren? Dudfield wrote:
>
> > Hello,
> >
> > I posted this on my blog the other day about people using pickle for
> > sessions, but got no response.  Do you guys think using pickles for
> > sessions is an ok thing to do?
>
> You don't want to accept pickles from an untrusted source, which
> typically means you don't want to accept pickles over the network.
> Even then, there are ways to use pickles securely. For example, you
> can, if you know what you're doing, arrange to prevent pickle from
> calling global objects or control specifically what global objects
> are callable.
>
> There is nothing wrong with using pickles to store data internally.
> As long as the pickles are generated by the application, there is no
> risk to the application reading them again, assuming that they are
> stored where they can't be tampered with.
>
> Saying pickle is inherently insecure is like saying Python is
> inherently insecure.  You don't want to execute Python from an
> untrusted source.  If someone can tamper with your Python code, then
> you have a serious security problem as well.
>
> Jim
>

From renesd at gmail.com  Sat Sep 16 04:23:22 2006
From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=)
Date: Sat, 16 Sep 2006 12:23:22 +1000
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <1158327631.9975.116.camel@www.venix.com>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
	<1158327631.9975.116.camel@www.venix.com>
Message-ID: <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com>

That seems like a good way to stop the untrusted session store from
being able to inject sessions in there.  That could at least solve the
problem of using pickles from untrusted session stores.

Are you just using the basic python types?  eg dict, string, list,
numbers etc?  If so, perhaps using another serialiser will remove some
more risk if you cared.


On 9/15/06, Python <python at venix.com> wrote:
> On Fri, 2006-09-15 at 18:29 +1000, Ren? Dudfield wrote:
> > Hello,
> >
> > I posted this on my blog the other day about people using pickle for
> > sessions, but got no response.  Do you guys think using pickles for
> > sessions is an ok thing to do?
>
> Either encrypt the pickle or have a seeded (md5) signature so that you
> can verify that the pickle has not been tampered.  I use pickles
> routinely, but with an md5 signature that combines a seed and the
> pickle.
>
> Someone cannot generate a valid signature without also knowing the seed.
> I am paranoid enough so that I only pickle dictionaries and then only
> extract and verify my list of expected keys after unpickling.  I can't
> prove that's secure, but I am not losing sleep over it.
>
> Presumably someone who knew the seed could generate a valid signature
> *and* inject code into the pickle that got executed by the unpickle
> operation.
>
> >
> >
> >
> >
> > ...........
> >
> > Some python web frame works are using pickle to store session data.
> > Pickle is a well known poor choice for secure systems. However it
> > seems to be more widely known by those writing network applications,
> > than those making web frameworks.
> >
> > Is your web framework using pickle for sessions despite the warnings
> > in the python documentation about it being insecure?
> >
> > By using sessions with pickle people who can write to the database
> > servers session table can execute code on the app server. Or people
> > who can get data into the session file/memcache data store can execute
> > data.
> >
> > This might be an issue if the database server is run by separate
> > people than the app server. Or if the session table is compromised by
> > an sql injection attack elsewhere.
> >
> > There are some more secure ways of storing pickled data.
> >
> > Pickle is deemed to be untrustworthy for data. In that it is not
> > certain that code can not be snuck into the data that will be executed
> > by pickle. So if some data from user input is put into the pickle,
> > then it is possible that code could be run.
> >
> > There are some people who know more about how to exploit pickle,
> > however the warning in the python documentation is this:
> >
> > ""Warning:
> > The pickle module is not intended to be secure against erroneous or
> > maliciously constructed data. Never unpickle data received from an
> > untrusted or unauthenticated source."""
> >
> >
> > Cerealizer might be an alternative option...
> > http://home.gna.org/oomadness/en/cerealizer/index.html
> >
> > Or maybe these other two.
> > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/415503
> > http://barnesc.blogspot.com/2006/01/rencode-reduced-length-encodings.html
> > _______________________________________________
> > Web-SIG mailing list
> > Web-SIG at python.org
> > Web SIG: http://www.python.org/sigs/web-sig
> > Unsubscribe: http://mail.python.org/mailman/options/web-sig/python%40venix.com
> --
> Lloyd Kvam
> Venix Corp
>
>

From python at venix.com  Sat Sep 16 13:44:24 2006
From: python at venix.com (Python)
Date: Sat, 16 Sep 2006 07:44:24 -0400
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
	<1158327631.9975.116.camel@www.venix.com>
	<64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com>
Message-ID: <1158407064.9975.199.camel@www.venix.com>

On Sat, 2006-09-16 at 12:23 +1000, Ren? Dudfield wrote:
> That seems like a good way to stop the untrusted session store from
> being able to inject sessions in there.  That could at least solve the
> problem of using pickles from untrusted session stores.
> 
> Are you just using the basic python types?  eg dict, string, list,
> numbers etc?  If so, perhaps using another serialiser will remove some
> more risk if you cared.

Besides the basic types, date/time objects are often included.

My use of md5 signatures was focused primarily on preventing unwanted
data manipulation.  I would agree that outside data should be acquired
in formats that are simpler than pickles.  I am pickling data that has
been checked and accepted.

> 
> 
> On 9/15/06, Python <python at venix.com> wrote:
> > On Fri, 2006-09-15 at 18:29 +1000, Ren? Dudfield wrote:
> > > Hello,
> > >
> > > I posted this on my blog the other day about people using pickle for
> > > sessions, but got no response.  Do you guys think using pickles for
> > > sessions is an ok thing to do?
> >
> > Either encrypt the pickle or have a seeded (md5) signature so that you
> > can verify that the pickle has not been tampered.  I use pickles
> > routinely, but with an md5 signature that combines a seed and the
> > pickle.
> >
> > Someone cannot generate a valid signature without also knowing the seed.
> > I am paranoid enough so that I only pickle dictionaries and then only
> > extract and verify my list of expected keys after unpickling.  I can't
> > prove that's secure, but I am not losing sleep over it.
> >
> > Presumably someone who knew the seed could generate a valid signature
> > *and* inject code into the pickle that got executed by the unpickle
> > operation.
> >
> > >
> > >
> > >
> > >
> > > ...........
> > >
> > > Some python web frame works are using pickle to store session data.
> > > Pickle is a well known poor choice for secure systems. However it
> > > seems to be more widely known by those writing network applications,
> > > than those making web frameworks.
> > >
> > > Is your web framework using pickle for sessions despite the warnings
> > > in the python documentation about it being insecure?
> > >
> > > By using sessions with pickle people who can write to the database
> > > servers session table can execute code on the app server. Or people
> > > who can get data into the session file/memcache data store can execute
> > > data.
> > >
> > > This might be an issue if the database server is run by separate
> > > people than the app server. Or if the session table is compromised by
> > > an sql injection attack elsewhere.
> > >
> > > There are some more secure ways of storing pickled data.
> > >
> > > Pickle is deemed to be untrustworthy for data. In that it is not
> > > certain that code can not be snuck into the data that will be executed
> > > by pickle. So if some data from user input is put into the pickle,
> > > then it is possible that code could be run.
> > >
> > > There are some people who know more about how to exploit pickle,
> > > however the warning in the python documentation is this:
> > >
> > > ""Warning:
> > > The pickle module is not intended to be secure against erroneous or
> > > maliciously constructed data. Never unpickle data received from an
> > > untrusted or unauthenticated source."""
> > >
> > >
> > > Cerealizer might be an alternative option...
> > > http://home.gna.org/oomadness/en/cerealizer/index.html
> > >
> > > Or maybe these other two.
> > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/415503
> > > http://barnesc.blogspot.com/2006/01/rencode-reduced-length-encodings.html
> > > _______________________________________________
> > > Web-SIG mailing list
> > > Web-SIG at python.org
> > > Web SIG: http://www.python.org/sigs/web-sig
> > > Unsubscribe: http://mail.python.org/mailman/options/web-sig/python%40venix.com
> > --
> > Lloyd Kvam
> > Venix Corp
> >
> >
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/python%40venix.com
-- 
Lloyd Kvam
Venix Corp


From ben at groovie.org  Mon Sep 18 19:27:03 2006
From: ben at groovie.org (Ben Bangert)
Date: Mon, 18 Sep 2006 10:27:03 -0700
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
	<1158327631.9975.116.camel@www.venix.com>
	<64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com>
Message-ID: <a75d15f762a2dbe82f5ff07f3e9e6a7f@groovie.org>

On Sep 15, 2006, at 7:23 PM, Ren? Dudfield wrote:

> That seems like a good way to stop the untrusted session store from
> being able to inject sessions in there.  That could at least solve the
> problem of using pickles from untrusted session stores.
>
> Are you just using the basic python types?  eg dict, string, list,
> numbers etc?  If so, perhaps using another serialiser will remove some
> more risk if you cared.

Why do you assume the session store is untrusted? If someone can hack 
into my database, they can typically hack into my web application so 
its pretty weird to consider the backend session store to be 
"untrusted". I think this is why using pickle for sessions is pretty 
harmless as you're the one writing to them, not the user.

While I can imagine a few situations where an untrusted session store 
might come into play, I'd generally imagine that the vast majority of 
the time one does trust their session storage as much as they trust 
that their application can't have its source code modified.

Cheers,
Ben


From python at venix.com  Mon Sep 18 20:16:02 2006
From: python at venix.com (Python)
Date: Mon, 18 Sep 2006 14:16:02 -0400
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <a75d15f762a2dbe82f5ff07f3e9e6a7f@groovie.org>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
	<1158327631.9975.116.camel@www.venix.com>
	<64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com>
	<a75d15f762a2dbe82f5ff07f3e9e6a7f@groovie.org>
Message-ID: <1158603362.22684.8.camel@www.venix.com>

On Mon, 2006-09-18 at 10:27 -0700, Ben Bangert wrote:
> Why do you assume the session store is untrusted? If someone can hack 
> into my database, they can typically hack into my web application so 
> its pretty weird to consider the backend session store to be 
> "untrusted".

You are assuming that the pickle is stored in a secure database.  If the
pickle is in a cookie or some other client side storage, then it is
definitely not to be trusted.

-- 
Lloyd Kvam
Venix Corp


From jim at zope.com  Mon Sep 18 20:24:23 2006
From: jim at zope.com (Jim Fulton)
Date: Mon, 18 Sep 2006 14:24:23 -0400
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <1158603362.22684.8.camel@www.venix.com>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
	<1158327631.9975.116.camel@www.venix.com>
	<64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com>
	<a75d15f762a2dbe82f5ff07f3e9e6a7f@groovie.org>
	<1158603362.22684.8.camel@www.venix.com>
Message-ID: <C37C7E23-E905-4C89-ACB9-39E85622C05F@zope.com>


On Sep 18, 2006, at 2:16 PM, Python wrote:

> On Mon, 2006-09-18 at 10:27 -0700, Ben Bangert wrote:
>> Why do you assume the session store is untrusted? If someone can hack
>> into my database, they can typically hack into my web application so
>> its pretty weird to consider the backend session store to be
>> "untrusted".
>
> You are assuming that the pickle is stored in a secure database.   
> If the
> pickle is in a cookie or some other client side storage, then it is
> definitely not to be trusted.

Right. Storing pickles in cookies is a very bad idea.
Hopefully, no one is doing that.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org


From python at venix.com  Mon Sep 18 20:34:50 2006
From: python at venix.com (Python)
Date: Mon, 18 Sep 2006 14:34:50 -0400
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <C37C7E23-E905-4C89-ACB9-39E85622C05F@zope.com>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
	<1158327631.9975.116.camel@www.venix.com>
	<64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com>
	<a75d15f762a2dbe82f5ff07f3e9e6a7f@groovie.org>
	<1158603362.22684.8.camel@www.venix.com>
	<C37C7E23-E905-4C89-ACB9-39E85622C05F@zope.com>
Message-ID: <1158604490.22684.13.camel@www.venix.com>

On Mon, 2006-09-18 at 14:24 -0400, Jim Fulton wrote:
> On Sep 18, 2006, at 2:16 PM, Python wrote:
> 
> > On Mon, 2006-09-18 at 10:27 -0700, Ben Bangert wrote:
> >> Why do you assume the session store is untrusted? If someone can hack
> >> into my database, they can typically hack into my web application so
> >> its pretty weird to consider the backend session store to be
> >> "untrusted".
> >
> > You are assuming that the pickle is stored in a secure database.   
> > If the
> > pickle is in a cookie or some other client side storage, then it is
> > definitely not to be trusted.
> 
> Right. Storing pickles in cookies is a very bad idea.
> Hopefully, no one is doing that.

As it happens, I am not using cookies to store pickles, but I've
considered it.  What makes it "a very bad idea"?

> 
> Jim
> 
> --
> Jim Fulton			mailto:jim at zope.com		Python Powered!
> CTO 				(540) 361-1714			http://www.python.org
> Zope Corporation	http://www.zope.com		http://www.zope.org
> 
> 
> 
-- 
Lloyd Kvam
Venix Corp


From jim at zope.com  Mon Sep 18 21:07:56 2006
From: jim at zope.com (Jim Fulton)
Date: Mon, 18 Sep 2006 15:07:56 -0400
Subject: [Web-SIG] Python pickle and web security.
In-Reply-To: <1158604490.22684.13.camel@www.venix.com>
References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com>
	<1158327631.9975.116.camel@www.venix.com>
	<64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com>
	<a75d15f762a2dbe82f5ff07f3e9e6a7f@groovie.org>
	<1158603362.22684.8.camel@www.venix.com>
	<C37C7E23-E905-4C89-ACB9-39E85622C05F@zope.com>
	<1158604490.22684.13.camel@www.venix.com>
Message-ID: <A8F99CB0-A1E8-4145-A72C-FBBCEA0618A5@zope.com>


On Sep 18, 2006, at 2:34 PM, Python wrote:

> On Mon, 2006-09-18 at 14:24 -0400, Jim Fulton wrote:
>> On Sep 18, 2006, at 2:16 PM, Python wrote:
>>
>>> On Mon, 2006-09-18 at 10:27 -0700, Ben Bangert wrote:
>>>> Why do you assume the session store is untrusted? If someone can  
>>>> hack
>>>> into my database, they can typically hack into my web  
>>>> application so
>>>> its pretty weird to consider the backend session store to be
>>>> "untrusted".
>>>
>>> You are assuming that the pickle is stored in a secure database.
>>> If the
>>> pickle is in a cookie or some other client side storage, then it is
>>> definitely not to be trusted.
>>
>> Right. Storing pickles in cookies is a very bad idea.
>> Hopefully, no one is doing that.
>
> As it happens, I am not using cookies to store pickles, but I've
> considered it.  What makes it "a very bad idea"?

Because, by default, a pickle can be constructed that will call more
or less any importable object. You never want to load pickles from
an untrusted source and, as you pointed out, cookies are an untrusted
source.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org


From michael.kerrin at openapp.biz  Fri Sep 29 15:18:26 2006
From: michael.kerrin at openapp.biz (Michael Kerrin)
Date: Fri, 29 Sep 2006 14:18:26 +0100
Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility
Message-ID: <451D1D22.5090607@openapp.biz>

Hi All,

  The WSGI specification says in the section on "Input and Error Streams":

      The optional "size" argument to readline() is not supported, as it 
may be
complex for server authors to implement, and is not often used in practice.

  But the current implementation of cgi.FieldStorage in the 2.4.4 branch 
and on Python 2.5 does call readline with the size argument. It has 
started to do this in response to the Python bug #1112549 - 
cgi.FieldStorage memory usage can spike in line-oriented ops. See 
http://sourceforge.net/tracker/index.php?func=detail&aid=1112549&group_id=5470&atid=105470
 
  Since it is reasonable for a WSGI application to use cgi.FieldStorage 
I am wondering whether cgi.FieldStorage or the WSGI specification needs 
to changed in order to solve this incompatibility.

  Originally I thought it was cgi.FieldStorage that needs to be changed, 
and hence tried to fix it by wrapping the input stream so that the 
readline method always uses the read method on the input stream. While 
this seems to work for me it introduces a level of complexity in the 
cgi.py file, and possible some other bugs, that makes me think that 
adding the size argument for readline into the WSGI specification isn't 
such bad idea after all.

  There way be other ways of modifying cgi.FieldStorage to solve this 
but I can't see how at the moment.

  For those that are interested , I have attached the patch but my main 
issue is where should this incompatibility be solved.

  Thanks
  Michael

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cgi.patch
Type: text/x-patch
Size: 5996 bytes
Desc: not available
Url : http://mail.python.org/pipermail/web-sig/attachments/20060929/4d03d5cb/attachment.bin 

From guido at python.org  Fri Sep 29 21:31:55 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 29 Sep 2006 12:31:55 -0700
Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility
In-Reply-To: <451D1D22.5090607@openapp.biz>
References: <451D1D22.5090607@openapp.biz>
Message-ID: <ca471dc20609291231o60293553w6622187903ba784e@mail.gmail.com>

On 9/29/06, Michael Kerrin <michael.kerrin at openapp.biz> wrote:
>   But the current implementation of cgi.FieldStorage in the 2.4.4 branch
> and on Python 2.5 does call readline with the size argument. It has
> started to do this in response to the Python bug #1112549 -
> cgi.FieldStorage memory usage can spike in line-oriented ops. See
> http://sourceforge.net/tracker/index.php?func=detail&aid=1112549&group_id=5470&atid=105470
>
>   Since it is reasonable for a WSGI application to use cgi.FieldStorage
> I am wondering whether cgi.FieldStorage or the WSGI specification needs
> to changed in order to solve this incompatibility.
>
>   Originally I thought it was cgi.FieldStorage that needs to be changed,
> and hence tried to fix it by wrapping the input stream so that the
> readline method always uses the read method on the input stream. While
> this seems to work for me it introduces a level of complexity in the
> cgi.py file, and possible some other bugs, that makes me think that
> adding the size argument for readline into the WSGI specification isn't
> such bad idea after all.

Since that change to cgi.py was a security fix I would strongly
recommend not to remove it and to change the WSGI spec instead.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)