From fumanchu at amor.org Fri Sep 8 09:33:24 2006 From: fumanchu at amor.org (Robert Brewer) Date: Fri, 8 Sep 2006 00:33:24 -0700 Subject: [Web-SIG] WSGI type tolerance Message-ID: <435DF58A933BA74397B42CDEB8145A86224C4D@ex9.hostedexchange.local> See http://www.cherrypy.org/ticket/561. Apparently, "several WSGI apps" return an int for the Content-Length header. PEP 333 leaves just enough unsaid that someone might determine that's OK. But wsgiref and paste.lint certainly seem to think header values must be strings. Can we tighten up the language in the PEP on this point (or just agree on an interpretation here, so it's documented)? Robert Brewer System Architect Amor Ministries fumanchu at amor.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/a76b9b86/attachment.htm From pje at telecommunity.com Fri Sep 8 17:56:38 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 08 Sep 2006 11:56:38 -0400 Subject: [Web-SIG] WSGI type tolerance In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C4D@ex9.hostedexchange. local> Message-ID: <5.1.1.6.0.20060908114854.0268fa48@sparrow.telecommunity.com> At 12:33 AM 9/8/2006 -0700, Robert Brewer wrote: >Content-class: urn:content-classes:message >Content-Type: multipart/alternative; > boundary="----_=_NextPart_001_01C6D319.11101680" > >See >http://www.cherrypy.org/ticket/561. >Apparently, "several WSGI apps" return an int for the Content-Length >header. PEP 333 leaves just enough unsaid that someone might determine >that's OK. But wsgiref and paste.lint certainly seem to think header >values must be strings. Can we tighten up the language in the PEP on this >point (or just agree on an interpretation here, so it's documented)? Those apps or servers are definitely broken; CGI environment variables are always strings. I would urge you to have CherryPy reject servers or middleware providing integer content-length as broken, preferably with an error that indicates the application is not WSGI-compliant. I'll change the phrasing of this: """The environ dictionary is required to contain these CGI environment variables, as defined by the Common Gateway Interface specification.""" to: """The environ dictionary is required to contain these CGI environment strings, as defined by the Common Gateway Interface specification.""" So that there is absolutely no room for ambiguity, although the CGI specification quite clearly calls for strings. In the meantime, feel free to cite this message and point folks in the direction of wsgiref.validate (which is actually paste.lint under a different name). It's also recommended that middleware authors test their middleware by using wsgiref.validate on *both* the server and application sides, not just one or the other. From pje at telecommunity.com Fri Sep 8 18:06:30 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 08 Sep 2006 12:06:30 -0400 Subject: [Web-SIG] WSGI type tolerance In-Reply-To: <5.1.1.6.0.20060908114854.0268fa48@sparrow.telecommunity.co m> References: <435DF58A933BA74397B42CDEB8145A86224C4D@ex9.hostedexchange. local> Message-ID: <5.1.1.6.0.20060908120116.02cfee88@sparrow.telecommunity.com> At 11:56 AM 9/8/2006 -0400, Phillip J. Eby wrote: >I'll change the phrasing of this: > >"""The environ dictionary is required to contain these CGI environment >variables, as defined by the Common Gateway Interface specification.""" > >to: > >"""The environ dictionary is required to contain these CGI environment >strings, as defined by the Common Gateway Interface specification.""" Oops. I just noticed that the ticket was about response headers, not CGI variables. I guess this is the part that needs to change: """The response_headers argument is a list of (header_name, header_value) tuples. It must be a Python list; i.e. type(response_headers) is ListType, and the server may change its contents in any way it desires. Each header_name must be a valid HTTP header field-name (as defined by RFC 2616, Section 4.2), without a trailing colon or other punctuation.""" I'll add: """Each ``header_name`` and ``header_value`` **must** be of StringType.""" Both changes are good, of course. I think I should also add some language regarding wsgiref in the stdlib, the importance of using wsgiref.validate, and a recommendation that servers not be any more "liberal in what they accept" than what the spec allows. From fumanchu at amor.org Fri Sep 8 20:08:26 2006 From: fumanchu at amor.org (Robert Brewer) Date: Fri, 8 Sep 2006 11:08:26 -0700 Subject: [Web-SIG] WSGI and readline(size) support (was: WSGI type tolerance) References: <435DF58A933BA74397B42CDEB8145A86224C4D@ex9.hostedexchange. local> <5.1.1.6.0.20060908120116.02cfee88@sparrow.telecommunity.com> Message-ID: <435DF58A933BA74397B42CDEB8145A86224C50@ex9.hostedexchange.local> Phillip J. Eby wrote: > Oops. I just noticed that the ticket was about > response headers, not CGI variables. > ... > I'll add: > """Each ``header_name`` and ``header_value`` **must** be of StringType.""" Great! Thanks very much. I'll have CherryPy deny non-strings. > I think I should also add some language regarding > wsgiref in the stdlib, the importance of using > wsgiref.validate, and a recommendation that servers > not be any more "liberal in what they accept" than > what the spec allows. Thanks for the tip. I've just run CherryPy's test suite through the validator, and discovered a conflict that could bite a lot of people soon. PEP 333 does not support the size argument to wsgi.input.readline(), stating that, "the optional "size" argument to readline() is not supported, as it may be complex for server authors to implement, and is not often used in practice." However, Python 2.5rc1 has fixed a DoS bug in cgi.FieldStorage by using readline(1<<16). See http://sourceforge.net/tracker/?func=detail&aid=1112549&group_id=5470&atid=105470. CherryPy has had this patched for a year or so: http://www.cherrypy.org/ticket/127 Now that wsgiref is in the stdlib, we should really fix either it or the cgi module so that there's no conflict. It may be "complex for server authors to implement" readline(size) support, but it's even more complex for application authors to re-implement FieldStorage. ;) For what it's worth, there was zero work to have CherryPy support readline(size); it's automatically provided by socket.makefile. Robert Brewer System Architect Amor Ministries fumanchu at amor.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/dcd7328f/attachment.htm From ianb at colorstudy.com Fri Sep 8 20:27:58 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 08 Sep 2006 13:27:58 -0500 Subject: [Web-SIG] WSGI: read method Message-ID: <4501B62E.4050308@colorstudy.com> An issue I just realized (as Robert was bringing up these things), is that I would like to be able to give input streams that don't have a known length. In particular, I want to be able to do this: input = environ['wsgi.input'] if hasattr(input, 'json_request'): request = input.json_request else: if 'CONTENT_LENGTH' in environ: raw_request = input.read(int(environ['CONTENT_LENGTH'])) else: raw_request = input.read() request = simplejson.loads(raw_request) The idea is that the request body won't be serialized unless necessary, so internal requests (JSON, XMLRPC, etc) can avoid any serialization, while the WSGI app can deal with both these cases and normal string-based requests. But I can't set CONTENT_LENGTH during these internal requests, because I'd need to figure out how long the serialized request body was, and that would require actually serializing it. In the end it doesn't matter a whole lot, because almost no intermediaries every look at wsgi.input, though if WSGI apps expecting a JSON request but not aware of .json_request get one of these requests, it is likely they will fail. Hmm... I could also set CONTENT_LENGTH='1', and make .read(1) return the actual entire body, totally ignoring the size argument. Or make it '99999', or whatever. That seems bad-clever, but maybe most workable with PEP 333? -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From fumanchu at amor.org Fri Sep 8 20:46:52 2006 From: fumanchu at amor.org (Robert Brewer) Date: Fri, 8 Sep 2006 11:46:52 -0700 Subject: [Web-SIG] WSGI: read method References: <4501B62E.4050308@colorstudy.com> Message-ID: <435DF58A933BA74397B42CDEB8145A86224C52@ex9.hostedexchange.local> Ian Bicking wrote: > An issue I just realized (as Robert was bringing up these things), > is that I would like to be able to give input streams that don't > have a known length. In particular, I want to be able to do this: > > input = environ['wsgi.input'] > if hasattr(input, 'json_request'): > request = input.json_request > else: > if 'CONTENT_LENGTH' in environ: > raw_request = input.read(int(environ['CONTENT_LENGTH'])) > else: > raw_request = input.read() > request = simplejson.loads(raw_request) > > The idea is that the request body won't be serialized unless necessary, > so internal requests (JSON, XMLRPC, etc) can avoid any serialization, > while the WSGI app can deal with both these cases and normal > string-based requests. But I can't set CONTENT_LENGTH during these > internal requests, because I'd need to figure out how long the > serialized request body was, and that would require actually serializing it. > > In the end it doesn't matter a whole lot, because almost no > intermediaries every look at wsgi.input, though if WSGI apps expecting a > JSON request but not aware of .json_request get one of these requests, > it is likely they will fail. > > Hmm... I could also set CONTENT_LENGTH='1', and make .read(1) return the > actual entire body, totally ignoring the size argument. Or make it > '99999', or whatever. That seems bad-clever, but maybe most workable > with PEP 333? I'm getting lost on the phrase "internal request"--what do you mean by that? Robert Brewer System Architect Amor Ministries fumanchu at amor.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/3f43b134/attachment.htm From ianb at colorstudy.com Fri Sep 8 20:55:48 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 08 Sep 2006 13:55:48 -0500 Subject: [Web-SIG] WSGI: read method In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C52@ex9.hostedexchange.local> References: <4501B62E.4050308@colorstudy.com> <435DF58A933BA74397B42CDEB8145A86224C52@ex9.hostedexchange.local> Message-ID: <4501BCB4.4040609@colorstudy.com> Robert Brewer wrote: > > Hmm... I could also set CONTENT_LENGTH='1', and make .read(1) return the > > actual entire body, totally ignoring the size argument. Or make it > > '99999', or whatever. That seems bad-clever, but maybe most workable > > with PEP 333? > > I'm getting lost on the phrase "internal request"--what do you mean by that? The WSGI request originates from Python, not from an external request. So, the WSGI environment is created from Python and sent to the application without any HTTP server. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From fumanchu at amor.org Fri Sep 8 23:02:28 2006 From: fumanchu at amor.org (Robert Brewer) Date: Fri, 8 Sep 2006 14:02:28 -0700 Subject: [Web-SIG] WSGI and long response header values Message-ID: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local> PEP 333 says: "Each header_value must not include any control characters, including carriage returns or linefeeds, either embedded or at the end. (These requirements are to minimize the complexity of any parsing that must be performed by servers, gateways, and intermediate response processors that need to inspect or modify response headers.)" [1] That's understandable, but HTTP headers are defined as (mostly) *TEXT, and "words of *TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047." [2] And RFC 2047 specifies that "an 'encoded-word' may not be more than 75 characters long...If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used." [3] This satisfies HTTP header folding rules, as well: "Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT." [1, again] So in my reading of HTTP, some code somewhere should introduce newlines in longish, encoded response header values. I see three options: 1. Keep things as they are and disallow response header values if they contain words over 75 chars that are outside the ISO-8859-1 character set 2. Allow newline characters in WSGI response headers 3. Require/strongly suggest WSGI servers to do the encoding and folding before sending the value over HTTP. Any other solutions? I'd like to see 2 or 3 adopted (unless something better comes along), so CherryPy can continue to support as much of the HTTP spec as possible. Robert Brewer System Architect Amor Ministries fumanchu at amor.org [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 [2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2 [3] http://www.rfc.net/rfc2047.html#s2 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/4f68771c/attachment-0001.htm From pje at telecommunity.com Fri Sep 8 23:55:02 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 08 Sep 2006 17:55:02 -0400 Subject: [Web-SIG] WSGI and long response header values In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange. local> Message-ID: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com> At 02:02 PM 9/8/2006 -0700, Robert Brewer wrote: >PEP 333 says: > >"Each header_value must not include any control characters, including >carriage returns or linefeeds, either embedded or at the end. (These >requirements are to minimize the complexity of any parsing that must be >performed by servers, gateways, and intermediate response processors that >need to inspect or modify response headers.)" [1] > >That's understandable, but HTTP headers are defined as (mostly) *TEXT, and >"words of *TEXT MAY contain characters from character sets other than >ISO-8859-1 only when encoded according to the rules of RFC 2047." [2] And >RFC 2047 specifies that "an 'encoded-word' may not be more than 75 >characters long...If it is desirable to encode more text than will fit in >an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by >CRLF SPACE) may be used." [3] This satisfies HTTP header folding rules, as >well: "Header fields can be extended over multiple lines by preceding each >extra line with at least one SP or HT." [1, again] > >So in my reading of HTTP, some code somewhere should introduce newlines in >longish, encoded response header values. I see three options: > > 1. Keep things as they are and disallow response header values if they > contain words over 75 chars that are outside the ISO-8859-1 character set > 2. Allow newline characters in WSGI response headers > 3. Require/strongly suggest WSGI servers to do the encoding and folding > before sending the value over HTTP. > >Any other solutions? I'd like to see 2 or 3 adopted (unless something >better comes along), so CherryPy can continue to support as much of the >HTTP spec as possible. #3 sounds most attractive, although I must confess I don't see how it could be made to work, since the strings have to be encoded already, unless you're saying that applications should encode them in chunks of up to 75 characters, separated by spaces, and that the servers should then fold the result. That would certainly seem like the least-intrusive way to deal with it, as a slight clarification to the spec, rather than any real *change* to the spec (and hence a new version of it) as #2 would require. From fumanchu at amor.org Sat Sep 9 01:22:22 2006 From: fumanchu at amor.org (Robert Brewer) Date: Fri, 8 Sep 2006 16:22:22 -0700 Subject: [Web-SIG] WSGI and long response header values References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com> Message-ID: <435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local> Phillip J. Eby wrote: > Robert Brewer wrote: > >So in my reading of HTTP, some code somewhere should introduce newlines in > >longish, encoded response header values. I see three options: > > > > 1. Keep things as they are and disallow response header values if they > > contain words over 75 chars that are outside the ISO-8859-1 character set > > 2. Allow newline characters in WSGI response headers > > 3. Require/strongly suggest WSGI servers to do the encoding and folding > > before sending the value over HTTP. > > > >Any other solutions? I'd like to see 2 or 3 adopted (unless something > >better comes along), so CherryPy can continue to support as much of the > >HTTP spec as possible. > > #3 sounds most attractive, although I must confess I don't see how it > could be made to work, since the strings have to be encoded already, > unless you're saying that applications should encode them in chunks > of up to 75 characters, separated by spaces, and that the servers > should then fold the result. That would certainly seem like the > least-intrusive way to deal with it, as a slight clarification to > the spec, rather than any real *change* to the spec (and hence a new > version of it) as #2 would require. Bah. I knew I forgot a constraint in there (the strings have to be encoded by the app). Personally, I think the "separate-by-spaces" cure is worse than the disease. I also finally found the only other discussion of this issue [1] and ... I wish we had allowed folding from the beginning. Given the obscure nature of this need, I would rather have had all WSGI implementations be 99% WSGI-compliant (by ignoring folding) than 99% HTTP-compliant (by not allowing folding). We could have improved the former number without changing the spec, but not the latter. Meh. Water under the bridge. Maybe in 1.1? Robert Brewer System Architect Amor Ministries fumanchu at amor.org [1] http://mail.python.org/pipermail/web-sig/2004-September/000749.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/42955f74/attachment.html From ianb at colorstudy.com Sat Sep 9 01:31:55 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 08 Sep 2006 18:31:55 -0500 Subject: [Web-SIG] WSGI and long response header values In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local> References: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local> Message-ID: <4501FD6B.5060002@colorstudy.com> Robert Brewer wrote: > PEP 333 says: > > "Each header_value must not include any control characters, including > carriage returns or linefeeds, either embedded or at the end. (These > requirements are to minimize the complexity of any parsing that must be > performed by servers, gateways, and intermediate response processors > that need to inspect or modify response headers.)" [1] > > That's understandable, but HTTP headers are defined as (mostly) *TEXT, > and "words of *TEXT MAY contain characters from character sets other > than ISO-8859-1 only when encoded according to the rules of RFC 2047." > [2] And RFC 2047 specifies that "an 'encoded-word' may not be more than > 75 characters long...If it is desirable to encode more text than will > fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's > (separated by CRLF SPACE) may be used." [3] This satisfies HTTP header > folding rules, as well: "Header fields can be extended over multiple > lines by preceding each extra line with at least one SP or HT." [1, again] > > So in my reading of HTTP, some code somewhere should introduce newlines > in longish, encoded response header values. Realistically, isn't this an artifact of a time when things like line-length mattered a lot more? That is, does any HTTP client actually care about or rely on the 75 character limit? -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From pje at telecommunity.com Sat Sep 9 02:37:13 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 08 Sep 2006 20:37:13 -0400 Subject: [Web-SIG] WSGI and long response header values In-Reply-To: <4501FD6B.5060002@colorstudy.com> References: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local> <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local> Message-ID: <5.1.1.6.0.20060908203621.0268dac8@sparrow.telecommunity.com> At 06:31 PM 9/8/2006 -0500, Ian Bicking wrote: >Robert Brewer wrote: > > PEP 333 says: > > > > "Each header_value must not include any control characters, including > > carriage returns or linefeeds, either embedded or at the end. (These > > requirements are to minimize the complexity of any parsing that must be > > performed by servers, gateways, and intermediate response processors > > that need to inspect or modify response headers.)" [1] > > > > That's understandable, but HTTP headers are defined as (mostly) *TEXT, > > and "words of *TEXT MAY contain characters from character sets other > > than ISO-8859-1 only when encoded according to the rules of RFC 2047." > > [2] And RFC 2047 specifies that "an 'encoded-word' may not be more than > > 75 characters long...If it is desirable to encode more text than will > > fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's > > (separated by CRLF SPACE) may be used." [3] This satisfies HTTP header > > folding rules, as well: "Header fields can be extended over multiple > > lines by preceding each extra line with at least one SP or HT." [1, again] > > > > So in my reading of HTTP, some code somewhere should introduce newlines > > in longish, encoded response header values. > >Realistically, isn't this an artifact of a time when things like >line-length mattered a lot more? That is, does any HTTP client actually >care about or rely on the 75 character limit? I believe RFC 2047 was originally created for email, where such a limit would actually matter. From foom at fuhm.net Sat Sep 9 04:10:46 2006 From: foom at fuhm.net (James Y Knight) Date: Fri, 8 Sep 2006 22:10:46 -0400 Subject: [Web-SIG] WSGI and long response header values In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local> References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com> <435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local> Message-ID: <517C7C04-0009-4227-BA72-B45DF30A27BA@fuhm.net> On Sep 8, 2006, at 7:22 PM, Robert Brewer wrote: > Bah. I knew I forgot a constraint in there (the strings have to be > encoded by the app). Personally, I think the "separate-by-spaces" > cure is worse than the disease. I also finally found the only other > discussion of this issue [1] and ... I wish we had allowed folding > from the beginning. Given the obscure nature of this need, I would > rather have had all WSGI implementations be 99% WSGI-compliant (by > ignoring folding) than 99% HTTP-compliant (by not allowing > folding). We could have improved the former number without changing > the spec, but not the latter. Meh. Water under the bridge. Maybe in > 1.1? I don't see what's wrong with encoding with the 75-char word-limit, separating "words" by spaces, *without* newlines. If the server feels like folding a long line into two, it can do so, but it's perfectly within its rights not to, and AFAIK nothing at all requires it to ever fold, given that a folded line is exactly equivalent to a single space. Line folding is one of those things that really has no purpose in HTTP besides to write out the examples in the RFCs. James From fumanchu at amor.org Sat Sep 9 07:45:21 2006 From: fumanchu at amor.org (Robert Brewer) Date: Fri, 8 Sep 2006 22:45:21 -0700 Subject: [Web-SIG] WSGI and long response header values References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com> <435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local> <517C7C04-0009-4227-BA72-B45DF30A27BA@fuhm.net> Message-ID: <435DF58A933BA74397B42CDEB8145A86224C55@ex9.hostedexchange.local> James Y Knight wrote: > On Sep 8, 2006, at 7:22 PM, Robert Brewer wrote: > > Bah. I knew I forgot a constraint in there (the strings > > have to be encoded by the app). Personally, I think the > > "separate-by-spaces" cure is worse than the disease. > > I also finally found the only other discussion of this > > issue [1] and ... I wish we had allowed folding from > > the beginning. Given the obscure nature of this need, > > I would rather have had all WSGI implementations be > > 99% WSGI-compliant (by ignoring folding) than 99% > > HTTP-compliant (by not allowing folding). We could > > have improved the former number without changing the > > spec, but not the latter. Meh. Water under the bridge. > > Maybe in 1.1? > > I don't see what's wrong with encoding with the 75-char > word-limit, separating "words" by spaces, *without* newlines. > If the server feels like folding a long line into two, it > can do so, but it's perfectly within its rights not to, > and AFAIK nothing at all requires it to ever fold, given > that a folded line is exactly equivalent to a single space. > Line folding is one of those things that really has no purpose > in HTTP besides to write out the examples in the RFCs. I was hoping that too, but the server is actually *not* within its rights to leave out the newlines, because that restriction is actually part of RFC 2047 (MIME headers), not the HTTP spec: "If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used." Robert Brewer System Architect Amor Ministries fumanchu at amor.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/3d18afc8/attachment.htm From fumanchu at amor.org Sat Sep 9 07:51:49 2006 From: fumanchu at amor.org (Robert Brewer) Date: Fri, 8 Sep 2006 22:51:49 -0700 Subject: [Web-SIG] WSGI and long response header values References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com><435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local><517C7C04-0009-4227-BA72-B45DF30A27BA@fuhm.net> <435DF58A933BA74397B42CDEB8145A86224C55@ex9.hostedexchange.local> Message-ID: <435DF58A933BA74397B42CDEB8145A86224C56@ex9.hostedexchange.local> James Y Knight wrote: > I don't see what's wrong with encoding with the 75-char > word-limit, separating "words" by spaces, *without* newlines. > If the server feels like folding a long line into two, it > can do so, but it's perfectly within its rights not to, > and AFAIK nothing at all requires it to ever fold, given > that a folded line is exactly equivalent to a single space. > Line folding is one of those things that really has no purpose > in HTTP besides to write out the examples in the RFCs. And I just said: > I was hoping that too, but the server is actually *not* > within its rights to leave out the newlines, because that > restriction is actually part of RFC 2047 (MIME headers), > not the HTTP spec. Bah. Of course, any HTTP server or proxy is free to unfold headers. So maybe the dream of arbitrary header values via MIME-encoding is broken from the get-go. Robert Brewer System Architect Amor Ministries fumanchu at amor.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20060908/e530b86f/attachment.html From foom at fuhm.net Sun Sep 10 01:49:33 2006 From: foom at fuhm.net (James Y Knight) Date: Sat, 9 Sep 2006 19:49:33 -0400 Subject: [Web-SIG] WSGI and long response header values In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C56@ex9.hostedexchange.local> References: <5.1.1.6.0.20060908175118.0269b768@sparrow.telecommunity.com><435DF58A933BA74397B42CDEB8145A86224C54@ex9.hostedexchange.local><517C7C04-0009-4227-BA72-B45DF30A27BA@fuhm.net> <435DF58A933BA74397B42CDEB8145A86224C55@ex9.hostedexchange.local> <435DF58A933BA74397B42CDEB8145A86224C56@ex9.hostedexchange.local> Message-ID: On Sep 9, 2006, at 1:51 AM, Robert Brewer wrote: > James Y Knight wrote: > > I don't see what's wrong with encoding with the 75-char > > word-limit, separating "words" by spaces, *without* newlines. > > If the server feels like folding a long line into two, it > > can do so, but it's perfectly within its rights not to, > > and AFAIK nothing at all requires it to ever fold, given > > that a folded line is exactly equivalent to a single space. > > Line folding is one of those things that really has no purpose > > in HTTP besides to write out the examples in the RFCs. > > And I just said: > > I was hoping that too, but the server is actually *not* > > within its rights to leave out the newlines, because that > > restriction is actually part of RFC 2047 (MIME headers), > > not the HTTP spec. > > Bah. Of course, any HTTP server or proxy is free to unfold headers. > So maybe the dream of arbitrary header values via MIME-encoding is > broken from the get-go. No, it's just an inconsistency in the RFC. I suggest reading the RFC2616 as having precedence over the requirements in RFC2047, and thus the line breaks are not required. I seriously doubt if anything will malfunction if given such input. Not that I know of any actual use cases of non-ASCII character encoding in http headers, anyhow. James From mnot at mnot.net Sun Sep 10 02:18:46 2006 From: mnot at mnot.net (Mark Nottingham) Date: Sat, 9 Sep 2006 17:18:46 -0700 Subject: [Web-SIG] WSGI and long response header values In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local> References: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local> Message-ID: <8F965D17-4410-499B-824B-6DEB2F29A2A6@mnot.net> My reading o WSGI was that multi-line headers should already be folding multi-line headers. If that's the case, what's the problem? Cheers, On 2006/09/08, at 2:02 PM, Robert Brewer wrote: > PEP 333 says: > > "Each header_value must not include any control characters, > including carriage returns or linefeeds, either embedded or at the > end. (These requirements are to minimize the complexity of any > parsing that must be performed by servers, gateways, and > intermediate response processors that need to inspect or modify > response headers.)" [1] > > That's understandable, but HTTP headers are defined as (mostly) > *TEXT, and "words of *TEXT MAY contain characters from character > sets other than ISO-8859-1 only when encoded according to the rules > of RFC 2047." [2] And RFC 2047 specifies that "an 'encoded-word' > may not be more than 75 characters long...If it is desirable to > encode more text than will fit in an 'encoded-word' of 75 > characters, multiple 'encoded-word's (separated by CRLF SPACE) may > be used." [3] This satisfies HTTP header folding rules, as well: > "Header fields can be extended over multiple lines by preceding > each extra line with at least one SP or HT." [1, again] > > So in my reading of HTTP, some code somewhere should introduce > newlines in longish, encoded response header values. I see three > options: > > 1. Keep things as they are and disallow response header values if > they contain words over 75 chars that are outside the ISO-8859-1 > character set > 2. Allow newline characters in WSGI response headers > 3. Require/strongly suggest WSGI servers to do the encoding and > folding before sending the value over HTTP. > > Any other solutions? I'd like to see 2 or 3 adopted (unless > something better comes along), so CherryPy can continue to support > as much of the HTTP spec as possible. > > > Robert Brewer > System Architect > Amor Ministries > fumanchu at amor.org > > [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 > [2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2 > [3] http://www.rfc.net/rfc2047.html#s2 > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot% > 40mnot.net -- Mark Nottingham http://www.mnot.net/ From mnot at mnot.net Sun Sep 10 19:25:12 2006 From: mnot at mnot.net (Mark Nottingham) Date: Sun, 10 Sep 2006 10:25:12 -0700 Subject: [Web-SIG] WSGI and long response header values In-Reply-To: <8F965D17-4410-499B-824B-6DEB2F29A2A6@mnot.net> References: <435DF58A933BA74397B42CDEB8145A86224C53@ex9.hostedexchange.local> <8F965D17-4410-499B-824B-6DEB2F29A2A6@mnot.net> Message-ID: <9AAE585A-9CE3-451D-B17A-AA6FCB6ED146@mnot.net> Now in English: My reading of WSGI was that implementations should already be folding multi-line headers. Cheers, On 2006/09/09, at 5:18 PM, Mark Nottingham wrote: > My reading o WSGI was that multi-line headers should already be > folding multi-line headers. If that's the case, what's the problem? > > Cheers, > > > On 2006/09/08, at 2:02 PM, Robert Brewer wrote: > >> PEP 333 says: >> >> "Each header_value must not include any control characters, >> including carriage returns or linefeeds, either embedded or at the >> end. (These requirements are to minimize the complexity of any >> parsing that must be performed by servers, gateways, and >> intermediate response processors that need to inspect or modify >> response headers.)" [1] >> >> That's understandable, but HTTP headers are defined as (mostly) >> *TEXT, and "words of *TEXT MAY contain characters from character >> sets other than ISO-8859-1 only when encoded according to the rules >> of RFC 2047." [2] And RFC 2047 specifies that "an 'encoded-word' >> may not be more than 75 characters long...If it is desirable to >> encode more text than will fit in an 'encoded-word' of 75 >> characters, multiple 'encoded-word's (separated by CRLF SPACE) may >> be used." [3] This satisfies HTTP header folding rules, as well: >> "Header fields can be extended over multiple lines by preceding >> each extra line with at least one SP or HT." [1, again] >> >> So in my reading of HTTP, some code somewhere should introduce >> newlines in longish, encoded response header values. I see three >> options: >> >> 1. Keep things as they are and disallow response header values if >> they contain words over 75 chars that are outside the ISO-8859-1 >> character set >> 2. Allow newline characters in WSGI response headers >> 3. Require/strongly suggest WSGI servers to do the encoding and >> folding before sending the value over HTTP. >> >> Any other solutions? I'd like to see 2 or 3 adopted (unless >> something better comes along), so CherryPy can continue to support >> as much of the HTTP spec as possible. >> >> >> Robert Brewer >> System Architect >> Amor Ministries >> fumanchu at amor.org >> >> [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 >> [2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2 >> [3] http://www.rfc.net/rfc2047.html#s2 >> >> _______________________________________________ >> Web-SIG mailing list >> Web-SIG at python.org >> Web SIG: http://www.python.org/sigs/web-sig >> Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot% >> 40mnot.net > > > -- > Mark Nottingham http://www.mnot.net/ > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot% > 40mnot.net -- Mark Nottingham http://www.mnot.net/ From davidf at sjsoft.com Tue Sep 12 11:12:51 2006 From: davidf at sjsoft.com (David Fraser) Date: Tue, 12 Sep 2006 11:12:51 +0200 Subject: [Web-SIG] Middleware stack construction Message-ID: <45067A13.8080306@sjsoft.com> Hi We've been trying to manage how we handle the middleware stack in our web framework. The PEP doesn't specify any standard way of doing this and the example is constructed with the next item in the stack as a parameter. Our approach is to pass a WSGIStack variable in the environment variables and get each layer of middleware to pop off the next layer and call it, thus: def run_child(self, environ, startresponse): child = environ["j5_WSGIStack"].pop() return child(environ, startresponse) so that the middleware can transform that in whichever way it wants... Does this fit in well with how other people are doing things? Just curious David From pje at telecommunity.com Tue Sep 12 17:19:05 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 12 Sep 2006 11:19:05 -0400 Subject: [Web-SIG] Middleware stack construction In-Reply-To: <45067A13.8080306@sjsoft.com> Message-ID: <5.1.1.6.0.20060912111415.026e98a8@sparrow.telecommunity.com> At 11:12 AM 9/12/2006 +0200, David Fraser wrote: >Hi > >We've been trying to manage how we handle the middleware stack in our >web framework. The PEP doesn't specify any standard way of doing this >and the example is constructed with the next item in the stack as a >parameter. > >Our approach is to pass a WSGIStack variable in the environment >variables and get each layer of middleware to pop off the next layer and >call it, thus: > >def run_child(self, environ, startresponse): > child = environ["j5_WSGIStack"].pop() > return child(environ, startresponse) > >so that the middleware can transform that in whichever way it wants... > >Does this fit in well with how other people are doing things? Just curious That's an interesting concept. I don't think anybody else has come up with it though. I'll certainly steal it if I ever get around to creating a competitor to Paste Deploy. :) But I imagine I'd use a linked list of tuples instead of popping from a list, e.g.: child, environ["my.middleware.stack"] = environ["my.middleware.stack"] This would allow the same chain to be used for every call, without copying, and it's probably faster than pop() as well, at least under CPython. From renesd at gmail.com Fri Sep 15 10:29:33 2006 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Fri, 15 Sep 2006 18:29:33 +1000 Subject: [Web-SIG] Python pickle and web security. Message-ID: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> Hello, I posted this on my blog the other day about people using pickle for sessions, but got no response. Do you guys think using pickles for sessions is an ok thing to do? ........... Some python web frame works are using pickle to store session data. Pickle is a well known poor choice for secure systems. However it seems to be more widely known by those writing network applications, than those making web frameworks. Is your web framework using pickle for sessions despite the warnings in the python documentation about it being insecure? By using sessions with pickle people who can write to the database servers session table can execute code on the app server. Or people who can get data into the session file/memcache data store can execute data. This might be an issue if the database server is run by separate people than the app server. Or if the session table is compromised by an sql injection attack elsewhere. There are some more secure ways of storing pickled data. Pickle is deemed to be untrustworthy for data. In that it is not certain that code can not be snuck into the data that will be executed by pickle. So if some data from user input is put into the pickle, then it is possible that code could be run. There are some people who know more about how to exploit pickle, however the warning in the python documentation is this: ""Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.""" Cerealizer might be an alternative option... http://home.gna.org/oomadness/en/cerealizer/index.html Or maybe these other two. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/415503 http://barnesc.blogspot.com/2006/01/rencode-reduced-length-encodings.html From jim at zope.com Fri Sep 15 12:29:41 2006 From: jim at zope.com (Jim Fulton) Date: Fri, 15 Sep 2006 06:29:41 -0400 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> Message-ID: <6CC72EAE-39FE-4685-B0DB-1B0C8C0D0E49@zope.com> On Sep 15, 2006, at 4:29 AM, Ren? Dudfield wrote: > Hello, > > I posted this on my blog the other day about people using pickle for > sessions, but got no response. Do you guys think using pickles for > sessions is an ok thing to do? You don't want to accept pickles from an untrusted source, which typically means you don't want to accept pickles over the network. Even then, there are ways to use pickles securely. For example, you can, if you know what you're doing, arrange to prevent pickle from calling global objects or control specifically what global objects are callable. There is nothing wrong with using pickles to store data internally. As long as the pickles are generated by the application, there is no risk to the application reading them again, assuming that they are stored where they can't be tampered with. Saying pickle is inherently insecure is like saying Python is inherently insecure. You don't want to execute Python from an untrusted source. If someone can tamper with your Python code, then you have a serious security problem as well. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From python at venix.com Fri Sep 15 15:40:31 2006 From: python at venix.com (Python) Date: Fri, 15 Sep 2006 09:40:31 -0400 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> Message-ID: <1158327631.9975.116.camel@www.venix.com> On Fri, 2006-09-15 at 18:29 +1000, Ren? Dudfield wrote: > Hello, > > I posted this on my blog the other day about people using pickle for > sessions, but got no response. Do you guys think using pickles for > sessions is an ok thing to do? Either encrypt the pickle or have a seeded (md5) signature so that you can verify that the pickle has not been tampered. I use pickles routinely, but with an md5 signature that combines a seed and the pickle. Someone cannot generate a valid signature without also knowing the seed. I am paranoid enough so that I only pickle dictionaries and then only extract and verify my list of expected keys after unpickling. I can't prove that's secure, but I am not losing sleep over it. Presumably someone who knew the seed could generate a valid signature *and* inject code into the pickle that got executed by the unpickle operation. > > > > > ........... > > Some python web frame works are using pickle to store session data. > Pickle is a well known poor choice for secure systems. However it > seems to be more widely known by those writing network applications, > than those making web frameworks. > > Is your web framework using pickle for sessions despite the warnings > in the python documentation about it being insecure? > > By using sessions with pickle people who can write to the database > servers session table can execute code on the app server. Or people > who can get data into the session file/memcache data store can execute > data. > > This might be an issue if the database server is run by separate > people than the app server. Or if the session table is compromised by > an sql injection attack elsewhere. > > There are some more secure ways of storing pickled data. > > Pickle is deemed to be untrustworthy for data. In that it is not > certain that code can not be snuck into the data that will be executed > by pickle. So if some data from user input is put into the pickle, > then it is possible that code could be run. > > There are some people who know more about how to exploit pickle, > however the warning in the python documentation is this: > > ""Warning: > The pickle module is not intended to be secure against erroneous or > maliciously constructed data. Never unpickle data received from an > untrusted or unauthenticated source.""" > > > Cerealizer might be an alternative option... > http://home.gna.org/oomadness/en/cerealizer/index.html > > Or maybe these other two. > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/415503 > http://barnesc.blogspot.com/2006/01/rencode-reduced-length-encodings.html > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/python%40venix.com -- Lloyd Kvam Venix Corp From renesd at gmail.com Sat Sep 16 04:07:01 2006 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Sat, 16 Sep 2006 12:07:01 +1000 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: <6CC72EAE-39FE-4685-B0DB-1B0C8C0D0E49@zope.com> References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> <6CC72EAE-39FE-4685-B0DB-1B0C8C0D0E49@zope.com> Message-ID: <64ddb72c0609151907g20e1afb9xeed6660fc1902b9d@mail.gmail.com> Hi, I think my main point was about using pickle for sessions, not just using pickle by itself. Unlike loading other data, code gets run when you load a pickle. It is indeed like running python code. So if you do not trust where you store your pickles to run python code, then that is a problem. If the unpickle or pickle code is not bug free, then you can not trust that unpickling a pickle will not allow data to be made which can trick the unpickle escaping code. With the history of bugs with the unpickle code, I don't think relying on it is a good idea. For a list of pickle bugs you can search the python bug tracker. There are over 70 bugs listed including the open, closed, and deleted bugs. With 13 open bugs listed. One of the bugs was closed because: 'Closing due to lack of response. cPickle is such a complex module, without a test case the leak cannot be found.' I think that line says best about how much you should trust the C module pickle code that is 5753 lines long, and has not been audited. Will pickle *always* escape data you pass it correctly when it encodes it into a pickle? Will unpickle *always* unescape parts of the pickle correctly? If not then those pickles can run code. The risk of using pickle does not seem to be worth the convenience that it gives. With alternatives to pickle which do not execute code being available why not use them? By using pickle for session data you allow people the oportunity to put data into the pickle. For example say you store a given GET variable in the session. Combining that you allow people with pickle-sessions to put data into the pickle, and the risk that pickle might not encode/decode it correctly is the problem I see. However if allowing untrusted data to be placed into a pickle is ok, then this is not a problem. That only leaves the problem of allowing the data store of your sessions to be able to execute code where you load sessions. This means you allow execution of code from your data store to your session loading code. Which means if you use a separate database machine(quite common), or if you use a separate memcache server(not unheard of) you allow these machines to execute code on the session using machine. There's a reason why people use separate user accounts, and separate machines for doing different tasks. That reason is to limit what each user or machine can do. By using pickles for sessions those benefits are removed in some cases. Cheers, On 9/15/06, Jim Fulton wrote: > > On Sep 15, 2006, at 4:29 AM, Ren? Dudfield wrote: > > > Hello, > > > > I posted this on my blog the other day about people using pickle for > > sessions, but got no response. Do you guys think using pickles for > > sessions is an ok thing to do? > > You don't want to accept pickles from an untrusted source, which > typically means you don't want to accept pickles over the network. > Even then, there are ways to use pickles securely. For example, you > can, if you know what you're doing, arrange to prevent pickle from > calling global objects or control specifically what global objects > are callable. > > There is nothing wrong with using pickles to store data internally. > As long as the pickles are generated by the application, there is no > risk to the application reading them again, assuming that they are > stored where they can't be tampered with. > > Saying pickle is inherently insecure is like saying Python is > inherently insecure. You don't want to execute Python from an > untrusted source. If someone can tamper with your Python code, then > you have a serious security problem as well. > > Jim > From renesd at gmail.com Sat Sep 16 04:23:22 2006 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Sat, 16 Sep 2006 12:23:22 +1000 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: <1158327631.9975.116.camel@www.venix.com> References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> <1158327631.9975.116.camel@www.venix.com> Message-ID: <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com> That seems like a good way to stop the untrusted session store from being able to inject sessions in there. That could at least solve the problem of using pickles from untrusted session stores. Are you just using the basic python types? eg dict, string, list, numbers etc? If so, perhaps using another serialiser will remove some more risk if you cared. On 9/15/06, Python wrote: > On Fri, 2006-09-15 at 18:29 +1000, Ren? Dudfield wrote: > > Hello, > > > > I posted this on my blog the other day about people using pickle for > > sessions, but got no response. Do you guys think using pickles for > > sessions is an ok thing to do? > > Either encrypt the pickle or have a seeded (md5) signature so that you > can verify that the pickle has not been tampered. I use pickles > routinely, but with an md5 signature that combines a seed and the > pickle. > > Someone cannot generate a valid signature without also knowing the seed. > I am paranoid enough so that I only pickle dictionaries and then only > extract and verify my list of expected keys after unpickling. I can't > prove that's secure, but I am not losing sleep over it. > > Presumably someone who knew the seed could generate a valid signature > *and* inject code into the pickle that got executed by the unpickle > operation. > > > > > > > > > > > ........... > > > > Some python web frame works are using pickle to store session data. > > Pickle is a well known poor choice for secure systems. However it > > seems to be more widely known by those writing network applications, > > than those making web frameworks. > > > > Is your web framework using pickle for sessions despite the warnings > > in the python documentation about it being insecure? > > > > By using sessions with pickle people who can write to the database > > servers session table can execute code on the app server. Or people > > who can get data into the session file/memcache data store can execute > > data. > > > > This might be an issue if the database server is run by separate > > people than the app server. Or if the session table is compromised by > > an sql injection attack elsewhere. > > > > There are some more secure ways of storing pickled data. > > > > Pickle is deemed to be untrustworthy for data. In that it is not > > certain that code can not be snuck into the data that will be executed > > by pickle. So if some data from user input is put into the pickle, > > then it is possible that code could be run. > > > > There are some people who know more about how to exploit pickle, > > however the warning in the python documentation is this: > > > > ""Warning: > > The pickle module is not intended to be secure against erroneous or > > maliciously constructed data. Never unpickle data received from an > > untrusted or unauthenticated source.""" > > > > > > Cerealizer might be an alternative option... > > http://home.gna.org/oomadness/en/cerealizer/index.html > > > > Or maybe these other two. > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/415503 > > http://barnesc.blogspot.com/2006/01/rencode-reduced-length-encodings.html > > _______________________________________________ > > Web-SIG mailing list > > Web-SIG at python.org > > Web SIG: http://www.python.org/sigs/web-sig > > Unsubscribe: http://mail.python.org/mailman/options/web-sig/python%40venix.com > -- > Lloyd Kvam > Venix Corp > > From python at venix.com Sat Sep 16 13:44:24 2006 From: python at venix.com (Python) Date: Sat, 16 Sep 2006 07:44:24 -0400 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com> References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> <1158327631.9975.116.camel@www.venix.com> <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com> Message-ID: <1158407064.9975.199.camel@www.venix.com> On Sat, 2006-09-16 at 12:23 +1000, Ren? Dudfield wrote: > That seems like a good way to stop the untrusted session store from > being able to inject sessions in there. That could at least solve the > problem of using pickles from untrusted session stores. > > Are you just using the basic python types? eg dict, string, list, > numbers etc? If so, perhaps using another serialiser will remove some > more risk if you cared. Besides the basic types, date/time objects are often included. My use of md5 signatures was focused primarily on preventing unwanted data manipulation. I would agree that outside data should be acquired in formats that are simpler than pickles. I am pickling data that has been checked and accepted. > > > On 9/15/06, Python wrote: > > On Fri, 2006-09-15 at 18:29 +1000, Ren? Dudfield wrote: > > > Hello, > > > > > > I posted this on my blog the other day about people using pickle for > > > sessions, but got no response. Do you guys think using pickles for > > > sessions is an ok thing to do? > > > > Either encrypt the pickle or have a seeded (md5) signature so that you > > can verify that the pickle has not been tampered. I use pickles > > routinely, but with an md5 signature that combines a seed and the > > pickle. > > > > Someone cannot generate a valid signature without also knowing the seed. > > I am paranoid enough so that I only pickle dictionaries and then only > > extract and verify my list of expected keys after unpickling. I can't > > prove that's secure, but I am not losing sleep over it. > > > > Presumably someone who knew the seed could generate a valid signature > > *and* inject code into the pickle that got executed by the unpickle > > operation. > > > > > > > > > > > > > > > > > ........... > > > > > > Some python web frame works are using pickle to store session data. > > > Pickle is a well known poor choice for secure systems. However it > > > seems to be more widely known by those writing network applications, > > > than those making web frameworks. > > > > > > Is your web framework using pickle for sessions despite the warnings > > > in the python documentation about it being insecure? > > > > > > By using sessions with pickle people who can write to the database > > > servers session table can execute code on the app server. Or people > > > who can get data into the session file/memcache data store can execute > > > data. > > > > > > This might be an issue if the database server is run by separate > > > people than the app server. Or if the session table is compromised by > > > an sql injection attack elsewhere. > > > > > > There are some more secure ways of storing pickled data. > > > > > > Pickle is deemed to be untrustworthy for data. In that it is not > > > certain that code can not be snuck into the data that will be executed > > > by pickle. So if some data from user input is put into the pickle, > > > then it is possible that code could be run. > > > > > > There are some people who know more about how to exploit pickle, > > > however the warning in the python documentation is this: > > > > > > ""Warning: > > > The pickle module is not intended to be secure against erroneous or > > > maliciously constructed data. Never unpickle data received from an > > > untrusted or unauthenticated source.""" > > > > > > > > > Cerealizer might be an alternative option... > > > http://home.gna.org/oomadness/en/cerealizer/index.html > > > > > > Or maybe these other two. > > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/415503 > > > http://barnesc.blogspot.com/2006/01/rencode-reduced-length-encodings.html > > > _______________________________________________ > > > Web-SIG mailing list > > > Web-SIG at python.org > > > Web SIG: http://www.python.org/sigs/web-sig > > > Unsubscribe: http://mail.python.org/mailman/options/web-sig/python%40venix.com > > -- > > Lloyd Kvam > > Venix Corp > > > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/python%40venix.com -- Lloyd Kvam Venix Corp From ben at groovie.org Mon Sep 18 19:27:03 2006 From: ben at groovie.org (Ben Bangert) Date: Mon, 18 Sep 2006 10:27:03 -0700 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com> References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> <1158327631.9975.116.camel@www.venix.com> <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com> Message-ID: On Sep 15, 2006, at 7:23 PM, Ren? Dudfield wrote: > That seems like a good way to stop the untrusted session store from > being able to inject sessions in there. That could at least solve the > problem of using pickles from untrusted session stores. > > Are you just using the basic python types? eg dict, string, list, > numbers etc? If so, perhaps using another serialiser will remove some > more risk if you cared. Why do you assume the session store is untrusted? If someone can hack into my database, they can typically hack into my web application so its pretty weird to consider the backend session store to be "untrusted". I think this is why using pickle for sessions is pretty harmless as you're the one writing to them, not the user. While I can imagine a few situations where an untrusted session store might come into play, I'd generally imagine that the vast majority of the time one does trust their session storage as much as they trust that their application can't have its source code modified. Cheers, Ben From python at venix.com Mon Sep 18 20:16:02 2006 From: python at venix.com (Python) Date: Mon, 18 Sep 2006 14:16:02 -0400 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> <1158327631.9975.116.camel@www.venix.com> <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com> Message-ID: <1158603362.22684.8.camel@www.venix.com> On Mon, 2006-09-18 at 10:27 -0700, Ben Bangert wrote: > Why do you assume the session store is untrusted? If someone can hack > into my database, they can typically hack into my web application so > its pretty weird to consider the backend session store to be > "untrusted". You are assuming that the pickle is stored in a secure database. If the pickle is in a cookie or some other client side storage, then it is definitely not to be trusted. -- Lloyd Kvam Venix Corp From jim at zope.com Mon Sep 18 20:24:23 2006 From: jim at zope.com (Jim Fulton) Date: Mon, 18 Sep 2006 14:24:23 -0400 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: <1158603362.22684.8.camel@www.venix.com> References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> <1158327631.9975.116.camel@www.venix.com> <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com> <1158603362.22684.8.camel@www.venix.com> Message-ID: On Sep 18, 2006, at 2:16 PM, Python wrote: > On Mon, 2006-09-18 at 10:27 -0700, Ben Bangert wrote: >> Why do you assume the session store is untrusted? If someone can hack >> into my database, they can typically hack into my web application so >> its pretty weird to consider the backend session store to be >> "untrusted". > > You are assuming that the pickle is stored in a secure database. > If the > pickle is in a cookie or some other client side storage, then it is > definitely not to be trusted. Right. Storing pickles in cookies is a very bad idea. Hopefully, no one is doing that. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From python at venix.com Mon Sep 18 20:34:50 2006 From: python at venix.com (Python) Date: Mon, 18 Sep 2006 14:34:50 -0400 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> <1158327631.9975.116.camel@www.venix.com> <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com> <1158603362.22684.8.camel@www.venix.com> Message-ID: <1158604490.22684.13.camel@www.venix.com> On Mon, 2006-09-18 at 14:24 -0400, Jim Fulton wrote: > On Sep 18, 2006, at 2:16 PM, Python wrote: > > > On Mon, 2006-09-18 at 10:27 -0700, Ben Bangert wrote: > >> Why do you assume the session store is untrusted? If someone can hack > >> into my database, they can typically hack into my web application so > >> its pretty weird to consider the backend session store to be > >> "untrusted". > > > > You are assuming that the pickle is stored in a secure database. > > If the > > pickle is in a cookie or some other client side storage, then it is > > definitely not to be trusted. > > Right. Storing pickles in cookies is a very bad idea. > Hopefully, no one is doing that. As it happens, I am not using cookies to store pickles, but I've considered it. What makes it "a very bad idea"? > > Jim > > -- > Jim Fulton mailto:jim at zope.com Python Powered! > CTO (540) 361-1714 http://www.python.org > Zope Corporation http://www.zope.com http://www.zope.org > > > -- Lloyd Kvam Venix Corp From jim at zope.com Mon Sep 18 21:07:56 2006 From: jim at zope.com (Jim Fulton) Date: Mon, 18 Sep 2006 15:07:56 -0400 Subject: [Web-SIG] Python pickle and web security. In-Reply-To: <1158604490.22684.13.camel@www.venix.com> References: <64ddb72c0609150129q35e2fb5el74e6c149370cede8@mail.gmail.com> <1158327631.9975.116.camel@www.venix.com> <64ddb72c0609151923s2704fa21pfe6560e38a861b43@mail.gmail.com> <1158603362.22684.8.camel@www.venix.com> <1158604490.22684.13.camel@www.venix.com> Message-ID: On Sep 18, 2006, at 2:34 PM, Python wrote: > On Mon, 2006-09-18 at 14:24 -0400, Jim Fulton wrote: >> On Sep 18, 2006, at 2:16 PM, Python wrote: >> >>> On Mon, 2006-09-18 at 10:27 -0700, Ben Bangert wrote: >>>> Why do you assume the session store is untrusted? If someone can >>>> hack >>>> into my database, they can typically hack into my web >>>> application so >>>> its pretty weird to consider the backend session store to be >>>> "untrusted". >>> >>> You are assuming that the pickle is stored in a secure database. >>> If the >>> pickle is in a cookie or some other client side storage, then it is >>> definitely not to be trusted. >> >> Right. Storing pickles in cookies is a very bad idea. >> Hopefully, no one is doing that. > > As it happens, I am not using cookies to store pickles, but I've > considered it. What makes it "a very bad idea"? Because, by default, a pickle can be constructed that will call more or less any importable object. You never want to load pickles from an untrusted source and, as you pointed out, cookies are an untrusted source. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From michael.kerrin at openapp.biz Fri Sep 29 15:18:26 2006 From: michael.kerrin at openapp.biz (Michael Kerrin) Date: Fri, 29 Sep 2006 14:18:26 +0100 Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility Message-ID: <451D1D22.5090607@openapp.biz> Hi All, The WSGI specification says in the section on "Input and Error Streams": The optional "size" argument to readline() is not supported, as it may be complex for server authors to implement, and is not often used in practice. But the current implementation of cgi.FieldStorage in the 2.4.4 branch and on Python 2.5 does call readline with the size argument. It has started to do this in response to the Python bug #1112549 - cgi.FieldStorage memory usage can spike in line-oriented ops. See http://sourceforge.net/tracker/index.php?func=detail&aid=1112549&group_id=5470&atid=105470 Since it is reasonable for a WSGI application to use cgi.FieldStorage I am wondering whether cgi.FieldStorage or the WSGI specification needs to changed in order to solve this incompatibility. Originally I thought it was cgi.FieldStorage that needs to be changed, and hence tried to fix it by wrapping the input stream so that the readline method always uses the read method on the input stream. While this seems to work for me it introduces a level of complexity in the cgi.py file, and possible some other bugs, that makes me think that adding the size argument for readline into the WSGI specification isn't such bad idea after all. There way be other ways of modifying cgi.FieldStorage to solve this but I can't see how at the moment. For those that are interested , I have attached the patch but my main issue is where should this incompatibility be solved. Thanks Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: cgi.patch Type: text/x-patch Size: 5996 bytes Desc: not available Url : http://mail.python.org/pipermail/web-sig/attachments/20060929/4d03d5cb/attachment.bin From guido at python.org Fri Sep 29 21:31:55 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 29 Sep 2006 12:31:55 -0700 Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility In-Reply-To: <451D1D22.5090607@openapp.biz> References: <451D1D22.5090607@openapp.biz> Message-ID: On 9/29/06, Michael Kerrin wrote: > But the current implementation of cgi.FieldStorage in the 2.4.4 branch > and on Python 2.5 does call readline with the size argument. It has > started to do this in response to the Python bug #1112549 - > cgi.FieldStorage memory usage can spike in line-oriented ops. See > http://sourceforge.net/tracker/index.php?func=detail&aid=1112549&group_id=5470&atid=105470 > > Since it is reasonable for a WSGI application to use cgi.FieldStorage > I am wondering whether cgi.FieldStorage or the WSGI specification needs > to changed in order to solve this incompatibility. > > Originally I thought it was cgi.FieldStorage that needs to be changed, > and hence tried to fix it by wrapping the input stream so that the > readline method always uses the read method on the input stream. While > this seems to work for me it introduces a level of complexity in the > cgi.py file, and possible some other bugs, that makes me think that > adding the size argument for readline into the WSGI specification isn't > such bad idea after all. Since that change to cgi.py was a security fix I would strongly recommend not to remove it and to change the WSGI spec instead. -- --Guido van Rossum (home page: http://www.python.org/~guido/)