Should urlencode() sort the query parameters (if they come from a dict)?
I just fixed a unittest for some code used at Google that was comparing a url generated by urllib.encode() to a fixed string. The problem was caused by turning on PYTHONHASHSEED=1. Because of this, the code under test would generate a textually different URL each time the test was run, but the intention of the test was just to check that all the query parameters were present and equal to the expected values. The solution was somewhat painful, I had to parse the url, split the query parameters, and compare them to a known dict. I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix: query = query.items() with this: query = sorted(query.items()) This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL. Thoughts? -- --Guido van Rossum (python.org/~guido)
Only if it is not an OrderedDict
query = sorted(query.items())
This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL.
On 17.08.2012 21:27, Guido van Rossum wrote:
query = sorted(query.items())
This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL.
Thoughts?
Sounds good. For best backwards compatibility, I'd restrict the sorting to the exact dict type, since people may be using non-dict mappings which already have a different stable order.
for all versions of Python that support PYTHONHASHSEED?
I think this cannot be done, in particular not for 2.6 and 3.1 - it's not a security fix (*). Strictly speaking, it isn't even a bug fix, since it doesn't restore the original behavior that some people (like your test case) relied on. In particular, if somebody has fixed PYTHONHASHSEED to get a stable order, this change would break such installations. By that policy, it could only go into 3.4. OTOH, if it also checked whether there is randomized hashing, and sort only in that case, I think it should be backwards compatible in all interesting cases. Regards, Martin (*) I guess some may claim that the current implementation leaks some bits of the hash seed, since you can learn the seed from the parameter order, so sorting would make it more secure. However, I would disagree that this constitutes a feasible threat.
Thanks, I filed http://bugs.python.org/issue15719 to track this.
On Fri, Aug 17, 2012 at 12:50 PM, "Martin v. Löwis"
On 17.08.2012 21:27, Guido van Rossum wrote:
query = sorted(query.items())
This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL.
Thoughts?
Sounds good. For best backwards compatibility, I'd restrict the sorting to the exact dict type, since people may be using non-dict mappings which already have a different stable order.
for all versions of Python that support PYTHONHASHSEED?
I think this cannot be done, in particular not for 2.6 and 3.1 - it's not a security fix (*).
Strictly speaking, it isn't even a bug fix, since it doesn't restore the original behavior that some people (like your test case) relied on. In particular, if somebody has fixed PYTHONHASHSEED to get a stable order, this change would break such installations. By that policy, it could only go into 3.4.
OTOH, if it also checked whether there is randomized hashing, and sort only in that case, I think it should be backwards compatible in all interesting cases.
Regards, Martin
(*) I guess some may claim that the current implementation leaks some bits of the hash seed, since you can learn the seed from the parameter order, so sorting would make it more secure. However, I would disagree that this constitutes a feasible threat.
-- --Guido van Rossum (python.org/~guido)
On Sat, Aug 18, 2012 at 5:27 AM, Guido van Rossum
I just fixed a unittest for some code used at Google that was comparing a url generated by urllib.encode() to a fixed string. The problem was caused by turning on PYTHONHASHSEED=1. Because of this, the code under test would generate a textually different URL each time the test was run, but the intention of the test was just to check that all the query parameters were present and equal to the expected values.
query = sorted(query.items())
Hmm. ISTM this is putting priority on the unit test above the functionality of actual usage. Although on the other hand, sorting parameters on a URL is nothing compared to the cost of network traffic, so it's unlikely to be significant. Chris Angelico
On 17 August 2012 18:52, Chris Angelico
On Sat, Aug 18, 2012 at 5:27 AM, Guido van Rossum
wrote: I just fixed a unittest for some code used at Google that was comparing a url generated by urllib.encode() to a fixed string. The problem was caused by turning on PYTHONHASHSEED=1. Because of this, the code under test would generate a textually different URL each time the test was run, but the intention of the test was just to check that all the query parameters were present and equal to the expected values.
query = sorted(query.items())
Hmm. ISTM this is putting priority on the unit test above the functionality of actual usage. Although on the other hand, sorting parameters on a URL is nothing compared to the cost of network traffic, so it's unlikely to be significant.
I don't think this behavior is only desirable to unit tests: having URL's been formed in predictable way a good thing in any way one thinks about it. js -><-
Chris Angelico
Joao S. O. Bueno writes:
I don't think this behavior is only desirable to unit tests: having URL's been formed in predictable way a good thing in any way one thinks about it.
Especially if you're a hacker. One more thing you may be able to use against careless sites that don't expect the unexpected to occur in URLs. I'm not saying this is a bad thing, but we should remember that the whole point of PYTHONHASHSEED is that regularities can be exploited for devious and malicious purposes, and reducing regularity makes many attacks more difficult. "*Any* way one thinks about it" is far too strong a claim. Steve
On Sat, 18 Aug 2012 14:23:13 +0900
"Stephen J. Turnbull"
Joao S. O. Bueno writes:
I don't think this behavior is only desirable to unit tests: having URL's been formed in predictable way a good thing in any way one thinks about it.
Especially if you're a hacker. One more thing you may be able to use against careless sites that don't expect the unexpected to occur in URLs.
That's unsubstantiated. Give an example of how sorted URLs compromise security. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net
Antoine Pitrou writes:
That's unsubstantiated.
Sure. If I had a CVE, I would have posted it.
Give an example of how sorted URLs compromise security.
That's not how you think about security; the right question about sorted URLs is "how do you know that they *don't* compromise security?" We know that mishandling URLs *can* compromise security (eg, via bugs in directory traversal). But you know that. What you presumably mean here is "why do you think randomly changing query parameter order in URLs is more secure than sorted order?" The answer to that is that since the server can't depend on order, it *must* handle more configurations of parameters by design (and presumably in implementation and testing), and therefore will be robust against more kinds of parameter configurations. Eg, there will be no temptation to optimize processing by handling parameters in sorted order. Is this a "real" danger? Maybe not. But every unnecessary regularity in inputs that a program's implementation depends on is a potential attack vector via irregular inputs. Remember, I was responding to a claim that sorted order is *always* better. That's a dangerous kind of claim to make about anything that could be input to an Internet server. Steve
On Sun, 19 Aug 2012 20:55:31 +0900
"Stephen J. Turnbull"
Antoine Pitrou writes:
That's unsubstantiated.
Sure. If I had a CVE, I would have posted it.
Ok, so you have no evidence. Regards Antoine.
On 18 August 2012 02:23, Stephen J. Turnbull
Joao S. O. Bueno writes:
I don't think this behavior is only desirable to unit tests: having URL's been formed in predictable way a good thing in any way one thinks about it.
Especially if you're a hacker. One more thing you may be able to use against careless sites that don't expect the unexpected to occur in URLs.
I'm not saying this is a bad thing, but we should remember that the whole point of PYTHONHASHSEED is that regularities can be exploited for devious and malicious purposes, and reducing regularity makes many attacks more difficult. "*Any* way one thinks about it" is far too strong a claim.
Ageeded that "any way one thinks about it" is far too strong a claim - but I still hold to the point. Maybe "most ways one thinks about it" :-) .
Steve
Joao S. O. Bueno writes:
Ageeded that "any way one thinks about it" is far too strong a claim - but I still hold to the point. Maybe "most ways one thinks about it" :-) .
100% agreement now.<wink/>
Guido van Rossum wrote:
I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix:
query = query.items()
with this:
query = sorted(query.items())
This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL.
Thoughts?
There may be people who mix bytes and str or pass other non-str keys:
query = {b"a":b"b", "c":"d", 5:6} urlencode(query) 'a=b&c=d&5=6' sorted(query.items()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: str() < bytes()
Not pretty, but a bugfix should not break such constructs.
Am 17.08.2012 21:27, schrieb Guido van Rossum:
I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix:
query = query.items()
with this:
query = sorted(query.items())
This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL.
I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application. The order of query string parameter is actually important for some applications, for example Zope, colander+deform and other form frameworks use the parameter order to group parameters. Therefore I propose that the query string is only sorted when the query is exactly a dict and not some subclass or class that has an items() method. if type(query) is dict: query = sorted(query.items()) else: query = query.items() Christian
On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes
Am 17.08.2012 21:27, schrieb Guido van Rossum:
I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix:
query = query.items()
with this:
query = sorted(query.items())
This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL.
I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application.
Hm. That's actually a good point.
The order of query string parameter is actually important for some applications, for example Zope, colander+deform and other form frameworks use the parameter order to group parameters.
Therefore I propose that the query string is only sorted when the query is exactly a dict and not some subclass or class that has an items() method.
if type(query) is dict: query = sorted(query.items()) else: query = query.items()
That's already in the bug I filed. :-) I also added that the sort may fail if the keys mix e.g. bytes and str (or int and str, for that matter). -- --Guido van Rossum (python.org/~guido)
On 18/08/2012 18:34, Guido van Rossum wrote:
On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes
wrote: Am 17.08.2012 21:27, schrieb Guido van Rossum:
I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix:
query = query.items()
with this:
query = sorted(query.items())
This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL.
I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application.
Hm. That's actually a good point.
The order of query string parameter is actually important for some applications, for example Zope, colander+deform and other form frameworks use the parameter order to group parameters.
Therefore I propose that the query string is only sorted when the query is exactly a dict and not some subclass or class that has an items() method.
if type(query) is dict: query = sorted(query.items()) else: query = query.items()
That's already in the bug I filed. :-) I also added that the sort may fail if the keys mix e.g. bytes and str (or int and str, for that matter).
One possible way around that is to add the class names, perhaps only if sorting raises an exception: def make_key(pair): return type(pair[0]).__name__, type(pair[1]).__name__, pair if type(query) is dict: try: query = sorted(query.items()) except TypeError: query = sorted(query.items(), key=make_key) else: query = query.items()
On Saturday, August 18, 2012, MRAB wrote:
On 18/08/2012 18:34, Guido van Rossum wrote:
On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes
wrote: Am 17.08.2012 21:27, schrieb Guido van Rossum:
I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix:
query = query.items()
with this:
query = sorted(query.items())
This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL.
I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application.
Hm. That's actually a good point.
The order of query string parameter is actually important for some
applications, for example Zope, colander+deform and other form frameworks use the parameter order to group parameters.
Therefore I propose that the query string is only sorted when the query is exactly a dict and not some subclass or class that has an items() method.
if type(query) is dict: query = sorted(query.items()) else: query = query.items()
That's already in the bug I filed. :-) I also added that the sort may fail if the keys mix e.g. bytes and str (or int and str, for that matter).
One possible way around that is to add the class names, perhaps only if sorting raises an exception:
def make_key(pair): return type(pair[0]).__name__, type(pair[1]).__name__, pair
if type(query) is dict: try: query = sorted(query.items()) except TypeError: query = sorted(query.items(), key=make_key) else: query = query.items()
Doesn't strike me as necessary.
______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-devhttp://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** guido%40python.orghttp://mail.python.org/mailman/options/python-dev/guido%40python.org
-- Sent from Gmail Mobile
On 8/18/2012 11:47 AM, MRAB wrote:
I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application.
Hm. That's actually a good point.
Seems adequate to me. Most programs wouldn't care about the order, because most web frameworks grab whatever is there in whatever order, and present it to the web app in their own order. Programs that care, or which talk to web apps that care, are unlikely to want the order from a non-randomized dict, and so have already taken care of ordering issues, so undoing the randomization seems like a solution in search of a problem (other than for poorly written test cases).
On Sat, Aug 18, 2012 at 1:55 PM, Glenn Linderman
On 8/18/2012 11:47 AM, MRAB wrote:
I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application.
Hm. That's actually a good point.
Seems adequate to me. Most programs wouldn't care about the order, because most web frameworks grab whatever is there in whatever order, and present it to the web app in their own order.
Programs that care, or which talk to web apps that care, are unlikely to want the order from a non-randomized dict, and so have already taken care of ordering issues, so undoing the randomization seems like a solution in search of a problem (other than for poorly written test cases).
I am of the same thought too. Changing a behavior based on the test case expectation, no matter if the behavior is a harmless change is still a change. Coming to the point testing query string could be useful in some cases and then giving weightage to the change seems interesting use case, but does not seem to warrant a change. I think, I like Christian Heimes suggestion that a wrapper to compare query strings would be useful and in Guido's original test case, a tittle test code change would have been good. Looks like Guido has withdrawn the bug report too. -- Senthil
participants (12)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Chris Angelico
-
Christian Heimes
-
Daniel Holth
-
Glenn Linderman
-
Guido van Rossum
-
Joao S. O. Bueno
-
MRAB
-
Peter Otten
-
Senthil Kumaran
-
Stephen J. Turnbull