Counting collisions for the win

Hi, I'm working on the hash collision issue since 2 or 3 weeks. I evaluated all solutions and I think that I have now a good knowledge of the problem and how it should be solved. The major issue is to have a minor or no impact on applications (don't break backward compatibility). I saw three major solutions: - use a randomized hash - use two hashes, a randomized hash and the actual hash kept for backward compatibility - count collisions on dictionary lookup Using a randomized hash does break a lot of tests (e.g. tests relying on the representation of a dictionary). The patch is huge, too big to backport it directly on stable versions. Using a randomized hash may also break (indirectly) real applications because the application output is also somehow "randomized". For example, in the Django test suite, the HTML output is different at each run. Web browsers may render the web page differently, or crash, or ... I don't think that Django would like to sort attributes of each HTML tag, just because we wanted to fix a vulnerability. Randomized hash has also a major issue: if the attacker is able to compute the secret, (s)he can easily compute collisions and exploit the hash collision vulnerability again. I don't know exactly how complex it is to compute the secret, but our hash function is weak (it is far from being cryptographic, it is really simple to run it backward). If someone writes a fast function to compute the secret, we will go back to the same point. IMO using two hashes has the same disavantages of the randomized hash solution, whereas it is more complex to implement. The last solution is very simple: count collision and raise an exception if it hits a limit. The path is something like 10 lines whereas the randomized hash is more close to 500 lines, add a new file, change Visual Studio project file, etc. First I thaught that it would break more applications than the randomized hash, but I tried on Django: the test suite fails with a limit of 20 collisions, but not with a limit of 50 collisions, whereas the patch uses a limit of 1000 collisions. According to my basic tests, a limit of 35 collisions requires a dictionary with more than 10,000,000 integer keys to raise an error. I am not talking about the attack, but valid data. More details about my tests on the Django test suite: http://bugs.python.org/issue13703#msg151620 -- I propose to solve the hash collision vulnerability by counting collisions because it does fix the vulnerability with a minor or no impact on applications or backward compatibility. I don't see why we should use a different fix for Python 3.3. If counting collisons solves the issue for stable versions, it is also enough for Python 3.3. We now know all issues of the randomized hash solution, and I think that there are more drawbacks than advantages. IMO the randomized hash is overkill to fix the hash collision issue. I just have some requests on Marc Andre Lemburg patch: - the limit should be configurable: a new function in the sys module should be enough. It may be private (or replaced by an environment variable?) in stable versions - the set type should also be patched (I didn't check if it is vulnerable or not using the patch) - the patch has no test! (a class with a fixed hash should be enough to write a test) - the limit must be documented somwhere - the exception type should be different than KeyError Victor

On Fri, Jan 20, 2012 at 00:48, Victor Stinner <victor.stinner@haypocalc.com> wrote:
I'd like to point out that an attacker is not limited to sending just one dict full of colliding keys. Given a 22ms stall for a dict full of 1000 colliding keys, and 100 such objects inside a parent object (perhaps JSON), you can stall a server for 2.2+ seconds. Going with the raise-at-1000 approach doesn't solve the problem for everyone. In addition, because the raise-at-N-collisions approach raises an exception, everyone who wants to handle this error condition properly has to change their code to catch a previously-unexpected exception. (I know they're usually still better off with the fix, but why force many people to change code when you can actually fix the hashing problem?) Another issue is that even with a configurable limit, different modules can't have their own limits. One module might want a relatively safe raise-at-100, and another module creating massive dicts might want raise-at-1000. How does a developer know whether they can raise or lower the limit, given that they use a bunch of different modules? I actually went with this stop-at-N-collisions approach by patching my CPython a few years ago, where I limiting dictobject and setobject's critical `for` loop to 100 iterations (I realize this might handle fewer than 100 collisions.) This worked fine until I tried to compile PyPy, where the translator blew up due to a massive dict. This, combined with the second problem (needing to catch an exception), led me to abandon this approach and write Securetypes, which has a securedict that uses SHA-1. Not that I like this either; I think I'm happy with the randomize-hash() approach. Ivan

On Thu, Jan 19, 2012 at 7:32 PM, Ivan Kozik <ivan@ludios.org> wrote:
It's "just" a DoS attack. Those won't go away. We just need to raise the effort needed for the attacker. The original attack would cause something like 5 minutes of CPU usage per request (with a set of colliding keys that could be computed once and used to attack every Python-run website in the world). That's at least 2 orders of magnitude worse. In addition, because the raise-at-N-collisions approach raises an
Why would anybody need to change their code? Every web framework worth its salt has a top-level error catcher that logs the error, serves a 500 response, and possibly does other things like email the admin.
I don't think it needs to be configurable. There just needs to be a way to turn it off.
I think that's because your collision-counting algorithm was much more primitive than MAL's.
Why did you need to catch the exception? Were you not happy with the program simply terminating with a traceback when it got attacked? -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 20, 2012 at 03:48, Guido van Rossum <guido@python.org> wrote:
I think that's because your collision-counting algorithm was much more primitive than MAL's.
Conceded.
No, I wasn't happy with termination. I wanted to treat it just like a JSON decoding error, and send the appropriate response. I actually forgot to mention the main reason I abandoned the stop-at-N-collisions approach. I had a server with a dict that stayed in memory, across many requests. It was being populated with identifiers chosen by clients. I couldn't have my server stay broken if this dict filled up with a bunch of colliding keys. (I don't think I could have done another thing either, like nuke the dict or evict some keys.) Ivan

On Thu, Jan 19, 2012 at 8:06 PM, Ivan Kozik <ivan@ludios.org> wrote:
No, I wasn't happy with termination. I wanted to treat it just like a JSON decoding error, and send the appropriate response.
So was this attack actually being mounted on your service regularly? I'd think it would be sufficient to treat it as a MemoryError -- unavoidable, if it happens things are really bad, and hopefully you'll crash quickly and some monitor process restarts your service. That's a mechanism that you should have anyway.
What would your service do if it ran out of memory? Maybe one tweak to the collision counting would be that the exception needs to inherit from BaseException (like MemoryError) so most generic exception handlers don't actually handle it. (Style note: never use "except:", always use "except Exception:".) -- --Guido van Rossum (python.org/~guido)

2012/1/20 Ivan Kozik <ivan@ludios.org>:
I'd like to point out that an attacker is not limited to sending just one dict full of colliding keys. Given a 22ms stall ...
The presented attack produces a stall of at least 30 seconds (5 minutes or more if there is no time limit in the application), not 0.022 second. You have to send a lot of requests to produce a DoS if a single requests just eat 22 ms. I suppose that there are a lot of other kinds of request than takes much longer than 22 ms, even valid requests.
Python becomes really slow when you have more than N collisions (O(n^2) problem). If an application hits this limit with valid data, it is time to use another data structure or use a different hash function. We have to do more tests to choose correctly N, but according my first tests, it looks like N=1000 is a safe limit. Marc Andre's patch doesn't count all "collisions", but only "collisions" requiring to compare objects. When two objects have the same hash value, the open addressing algorithm searchs a free bucket. If a bucket is not free but has a different hash value, the objects are not compared and the collision counter is not incremented. The limit is only reached when you have N objects having the same hash value modulo the size of the bucket (hash(str) & DICT_MASK). When there are not enough empty buckets (it comes before all buckets are full), Python resizes the dictionary (it does something like size = size * 2) and so it uses at least one more bit each time than the dictionary is resized. Collisions are very likely with a small dictioanry, but becomes more rare each time than the dictionary is resized. It means that the number of potential collisions (with valid data) decreases when the dictionary grows. Tell me if I am wrong.

Victor Stinner wrote:
You might think that 10 million keys is a lot of data, but that's only about 100 MB worth. I already see hardware vendors advertising computers with 6 GB RAM as "entry level", e.g. the HP Pavilion starts with 6GB expandable to 16GB. I expect that there are already people using Python who will unpredictably hit that limit by accident, and the number will only grow as computers get more memory. With a limit of 35 collisions, it only takes 35 keys to to force a dict to raise an exception, if you are an attacker able to select colliding keys. We're trying to defend against an attacker who is able to force collisions, not one who is waiting for accidental collisions. I don't see that causing the dict to raise an exception helps matters: it just changes the attack from "keep the dict busy indefinitely" to "cause an exception and crash the application". This moves responsibility from dealing with collisions out of the dict to the application code. Instead of solving the problem in one place (the built-in dict) now every application that uses dicts has to identify which dicts can be attacked, and deal with the exception. That pushes the responsibility for security onto people who are the least willing or able to deal with it: the average developer, who neither understands nor cares about security, or if they do care, they can't convince their manager to care. I suppose an exception is an improvement over the application hanging indefinitely, but I'd hardly call it a fix. Ruby uses randomized hashes. Are there any other languages with a dict or mapping class that raises on too many exceptions? -- Steven

On Fri, Jan 20, 2012 at 2:00 PM, Steven D'Aprano <steve@pearwood.info> wrote:
No, that's fundamentally misunderstanding the nature of the attack. The reason the hash collision attack is a problem is because it allows you to DoS a web service in a way that requires minimal client side resources but can have a massive effect on the server. The attacker is making a single request that takes the server an inordinately long time to process, consuming CPU resources all the while, and likely preventing the handling of any other requests (especially for an event-based server, since the attack is CPU based, bypassing all use of asynchronous IO). With the 1000 collision limit in place, the attacker sends their massive request, the affected dict quickly hits the limit, throws an unhandled exception which is then caught by the web framework and turned into a 500 Error response (or whatever's appropriate for the protocol being attacked). If a given web service doesn't *already* have a catch all handler to keep an unexpected exception from bringing the entire service down, then DoS attacks like this one are the least of its worries. As for why other languages haven't gone this way, I have no idea. There are lots of details relating to a language's hash and hash map design that will drive how suitable randomisation is as an answer, and it also depends greatly on how you decide to characterise the threat. FWIW, Victor's analysis in the opening post of this thread matches the conclusions I came to a few days ago, although he's been over the alternatives far more thoroughly than I have. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Victor, On 01/19/2012 05:48 PM, Victor Stinner wrote: [snip]
I'm a Django core developer, and if it is true that our test-suite has a dictionary-ordering dependency that is expressed via HTML attribute ordering, I consider that a bug and would like to fix it. I'd be grateful for, not resentful of, a change in CPython that revealed the bug and prompted us to fix it. (I presume that it is true, as it sounds like you experienced it directly; I don't have time to play around at the moment, but I'm surprised we haven't seen bug reports about it from users of 64-bit Pythons long ago). I can't speak for the core team, but I doubt there would be much disagreement on this point: ideally Django would run equally well on any implementation of Python, and as far as I know none of the alternative implementations guarantee hash or dict-ordering compatibility with CPython. I don't have the expertise to speak otherwise to the alternatives for fixing the collisions vulnerability, but I don't believe it's accurate to presume that Django would not want to fix a dict-ordering dependency, and use that as a justification for one approach over another. Carl -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8Y83oACgkQ8W4rlRKtE2cNawCg5q/p1+OOKFYDymDJGoClBBlg WNAAn3xevD+0CqAQ+mFNHCBhtLgw8IYv =HDOh -----END PGP SIGNATURE-----

On Fri, Jan 20, 2012 at 2:54 PM, Carl Meyer <carl@oddbird.net> wrote:
It's more a matter of wanting deployment of a security fix to be as painless as possible - a security fix that system administrators can't deploy because it breaks critical applications may as well not exist. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jan 20, 2012, at 03:18 PM, Nick Coghlan wrote:
True, but collision counting is worse IMO. It's just as likely (maybe) that an application would start getting new exceptions on dictionary insertion, as they would failures due to dictionary order changes. Unfortunately, in the former case it's because Python just added a new public API in a security release (the new exception *is* public API). In the latter case, no new API was added, but something exposed an already existing bug in the application. That's still a bug in the application even if counting was added. It's also a bug that any number of changes in the environment, or OS vendor deployment, could have triggered. -1 for collision counting. -Barry

On 1/19/2012 8:54 PM, Carl Meyer wrote:
It might be a good idea to have a way to seed the hash with some value to allow testing with different dict orderings -- this would allow tests to be developed using one Python implementation that would be immune to the different orderings on different implementations; however, randomizing the hash not only doesn't solve the problem for long-running applications, it causes non-deterministic performance from one run to the next even with the exact same data: a different (random) seed could cause collisions sporadically with data that usually gave good performance results, and there would be little explanation for it, and little way to reproduce the problem to report it or understand it.

I'm surprised we haven't seen bug reports about it from users of 64-bit Pythons long ago
A Python dictionary only uses the lower bits of a hash value. If your dictionary has less than 2**32 items, the dictionary order is exactly the same on 32 and 64 bits system: hash32(str) & mask == hash64(str) & mask for mask <= 2**32-1.

In http://mail.python.org/pipermail/python-dev/2012-January/115715.html Frank Sievertsen wrote: Am 20.01.2012 13:08, schrieb Victor Stinner:
No, that's not true. Whenever a collision happens, other bits are mixed in very fast.
Frank
Bits are mixed in quickly from a denial-of-service standpoint, but Victor is correct from a "Why don't the tests already fail?" standpoint. A dict with 2**12 slots, holding over 2700 entries, will be far larger than most test cases -- particularly those with visible output. In a dict that size, 32-bit and 64-bit machines will still probe the same first, second, third, fourth, fifth, and sixth slots. Even on the rare cases when there are at least 6 collisions, the next slots may well be either the same, or close enough that it doesn't show up in a changed iteration order. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ

The main issue with that approach is that it allows a new kind of attack. An attacker now needs to find 1000 colliding keys, and submit them one-by-one into a database. The limit will not trigger, as those are just database insertions. Now, if the applications also as a need to read the entire database table into a dictionary, that will suddenly break, and not for the attacker (which would be ok), but for the regular user of the application or the site administrator. So it may be that this approach actually simplifies the attack, making the cure worse than the disease. Regards, Martin

The main issue with that approach is that it allows a new kind of attack.
Indeed, I posted another example: http://bugs.python.org/msg151677 This kind of fix can be used in a specific application or maybe in a special-purpose framework, but not on the level of a general-purpose language. Frank

On Fri, Jan 20, 2012 at 1:57 AM, Frank Sievertsen <pydev@sievertsen.de>wrote:
Right. We are discussion this issue (for weeks now...) because it makes pretty much any Python app that takes untrusted data vulnerable, especially web apps, and after extensive analysis we came to the conclusion that defenses in the framework or in the app are really hard to do, very disruptive for developers, whereas preventing the attack by a modification of the dict or hash algorithms would fix it for everybody. And moreover, the attack would work against pretty much any Python web app using a set of evil strings computed once (hence encouraging script kiddies of just firing their fully-automatic weapon at random websites). The new attacks that are now being considered require analysis of how the website is implemented, how it uses and stores data, etc. So an attacker has to sit down and come up with an attack tailored to a specific website. That can be dealt with on an ad-hoc basis. -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 20, 2012 at 7:34 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Ouch, I think you're right. So hash randomisation may be the best option, and admins will need to test for themselves to see if it breaks things... Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Oh, good catch. But it would not call it a new kind of attack, it is just a particular case of the hash collision vulnerability. Counting collision doesn't solve this case, but it doesn't make the situation worse than before. Raising quickly an exception is better than stalling for minutes, even if I agree than it is not the best behaviour. The best would is to answer quickly with the expected result :-) (using a different data structure or a different hash function?) Right now, I don't see any counter measure against this case. Victor

On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:
ISTM that adding the possibility of raising a new exception on dictionary insertion is *more* backward incompatible than changing dictionary order, which for a very long time has been known to not be guaranteed. You're running some application, you upgrade Python because you apply all security fixes, and suddenly you're starting to get exceptions in places you can't really do anything about. Yet those exceptions are now part of the documented public API for dictionaries. This is asking for trouble. Bugs will suddenly start appearing in that application's tracker and they will seem to the application developer like Python just added a new public API in a security release. OTOH, if you change dictionary order and *that* breaks the application, then the bugs submitted to the application's tracker will be legitimate bugs that have to be fixed even if nothing else changed. So I still think we should ditch the paranoia about dictionary order changing, and fix this without counting. A little bit of paranoia could creep back in by disabling the hash fix by default in stable releases, but I think it would be fine to make that a compile-time option. -Barry

So I still think we should ditch the paranoia about dictionary order changing, and fix this without counting.
The randomized hash has other issues: - its security is based on its secret, whereas it looks to be easy to compute it (see more details in the issue) - my patch only changes hash(str), whereas other developers asked me to patch also bytes, int and other types hash(bytes) can be changed. But changing hash(int) may leak easily the secret. We may use a different secret for each type, but if it is easy to compute int hash secret, dictionaries using int are still vulnerable. -- There is no perfect solutions, drawbacks of each solution should be compared. Victor

On Fri, 20 Jan 2012 17:17:24 +0100 Victor Stinner <victor.stinner@haypocalc.com> wrote:
How do you compute the secret? I see two possibilities: - the application leaks the hash() values: this sounds unlikely since I don't see the use case for it; - the application shows the dict iteration order (e.g. order of HTML attributes): then we could add a second per-dictionary secret so that the iteration order of a single dict doesn't give any useful information about the hash function. But the bottom line for me is the following: - randomized hashes eliminate the possibility to use a single exploit for all Python-powered applications: for each application, the attacker now has to find a way to extract the secret; - collision counting doesn't eliminate the possibility of generic exploits, as Frank Sievertsen has just shown in http://mail.python.org/pipermail/python-dev/2012-January/115726.html Regards Antoine.

On 1/20/2012 11:17 AM, Victor Stinner wrote:
There is no perfect solutions, drawbacks of each solution should be compared.
Amen. One possible attack that has been described for a collision counting dict depends on knowing precisely the trigger point. So let MAXCOLLISIONS either be configureable or just choose a random count between M and N, say 700 and 999. It would not hurt to have alternate patches available in case a particular Python-powered site comes under prolonged attack. Though given our miniscule share of the market, than is much less likely that an attack on a PHP- or MS-powered site. -- Terry Jan Reedy

On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw <barry@python.org> wrote:
Dict insertion can already raise an exception: MemoryError. I think we should be safe if the new exception also derives from BaseException. We should actually eriously consider just raising MemoryException, since introducing a new built-in exception in a bugfix release is also very questionable: code explicitly catching or raising it would not work on previous bugfix releases of the same feature release. OTOH, if you change dictionary order and *that* breaks the application, then
There are lots of things that are undefined according to the language spec (and quite possibly known to vary between versions or platforms or implementations like PyPy or Jython) but which we would never change in a bugfix release. So I still think we should ditch the paranoia about dictionary order
I'm sorry, but I don't want to break a user's app with a bugfix release and say "haha your code was already broken you just didn't know it". Sure, the dict order already varies across Python implementations, possibly across 32/64 bits or operating systems. But many organizations (I know a few :-) have a very large installed software base, created over many years by many people with varying skills, that is kept working in part by very carefully keeping the environment as constant as possible. This means that the target environment is much more predictable than it is for the typical piece of open source software. Sure, a good Python developer doesn't write apps or tests that depend on dict order. But time and again we see that not everybody writes perfect code every time. Especially users writing "in-house" apps (as opposed to frameworks shared as open source) are less likely to always use the most robust, portable algorithms in existence, because they may know with much more certainty that their code will never be used on certain combinations of platforms. For example, I rarely think about whether code I write might not work on IronPython or Jython, or even CPython on Windows. And if something I wrote suddenly needs to be ported to one of those, well, that's considered a port and I'll just accept that it might mean changing a few things. The time to break a dependency on dict order is not with a bugfix release but with a feature release: those are more likely to break other things as well anyway, and uses are well aware that they have to test everything and anticipate having to fix some fraction of their code for each feature release. OTOH we have established a long and successful track record of conservative bugfix releases that don't break anything. (I am aware of exactly one thing that was broken by a bugfix release in application code I am familiar with.) -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 20, 2012 at 13:15, Guido van Rossum <guido@python.org> wrote:
Why can't we have our cake and eat it too? Can we do hash randomization in 3.3 and use the hash count solution for bugfix releases? That way we get a basic fix into the bugfix releases that won't break people's code (hopefully) but we go with a more thorough (and IMO correct) solution of hash randomization starting with 3.3 and moving forward. We aren't breaking compatibility in any way by doing this since it's a feature release anyway where we change tactics. And it can't be that much work since we seem to have patches for both solutions. At worst it will make merging commits for those files affected by the patches, but that will most likely be isolated and not a common collision (and less of any issue once 3.3 is released later this year). I understand the desire to keep backwards-compatibility, but collision counting could cause an error in some random input that someone didn't expect to cause issues whether they were under a DoS attack or just had some unfortunate input from private data. The hash randomization, though, is only weak if someone is attacked, not if they are just using Python with their own private data.

Even if a MemoryException is raised I believe that is still a fundamental change in the documented contract of dictionary API. I don't believe there is a way to fix this without breaking someones application. The major differences I see between the two solutions is that counting will break people's applications who are otherwise following the documented api contract of dictionaries, and randomization will break people's applications who are violating the documented api contract of dictionaries. Personally I feel that the lesser of two evils is to reward those who followed the documentation, and not reward those who didn't. So +1 for Randomization as the only option in 3.3, and off by default with a flag or environment variable in bug fixes. I think it's the only way to proceed that won't hurt people who have followed the documented behavior. On Friday, January 20, 2012 at 1:49 PM, Brett Cannon wrote:

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/20/2012 02:04 PM, Donald Stufft wrote:
Even if a MemoryException is raised I believe that is still a fundamental change in the documented contract of dictionary API.
How so? Dictionary inserts can *already* raise that error.
Do you have a case in mind where legitimate user data (not crafted as part of a DoS attack) would trip the 1000-collision limit? How likely is it that such cases exist in already-deployed applications, compared to the known breakage in existing applications due to hash randomization?
Except that I think your set is purely hypothetical, while the second set is *lots* of deployed applications. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ZwlgACgkQ+gerLs4ltQ4KOACglAHDgn5wUb+cye99JbeW0rZo 5oAAn2ja7K4moFLN/aD4ZP7m+8WnwhcA =u7Mt -----END PGP SIGNATURE-----

On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:
I don't, but as there's never been a limit on how many collisions a dictionary can have, this would be a fundamental change in the documented (and undocumented) abilities of a dictionary. Dictionary key order has never been guaranteed, is documented to not be relied on, already changes depending on if you are using 32bit, 64bit, Jython, PyPy etc or as someone else pointed out, to any number of possible improvements to dict. The counting solution violates the existing contract in order to serve people who themselves are violating the contract. Even with their violation the method that I +1'd still serves to not break existing applications by default.
Which is why I believe that it should be off by default on the bugfix, but easily enabled. (Flag, env var, whatever). That allows people to upgrade to a bugfix without breaking their application, and if this vulnerability affects them, they can enable it. I think the counting collision is at best a bandaid and not a proper fix stemmed from a desire to not break existing applications on a bugfix release which can be better solved by implementing the real fix and allowing people to control (only on the bugfix, on 3.3+ it should be forced to on always) if they have it enabled or not.

Guido van Rossum wrote:
Whether you have run out of total memory, or a single contiguous block, it is still a memory error. An arbitrary limit on collisions is not a memory error. If we were designing this API from scratch, would anyone propose using MemoryError for "you have reached a limit on collisions"? It has nothing to do with memory. Using MemoryError for something which isn't a memory error is ugly. How about RuntimeError? -- Steven

On Fri, Jan 20, 2012 at 6:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Guido van Rossum wrote:
It should derive from BaseException.
Sorry, I was ambiguous. I meant it should not derive from Exception. It goes RuntimeError -> StandardError -> Exception -> BaseException. -- --Guido van Rossum (python.org/~guido)

On 1/20/2012 2:51 PM, Donald Stufft wrote:
My opinion of counting is better than yours, but even conceding the theoretical, purity argument, our release process is practical as well. There have been a few occasions when fixes to bugs in our code have been delayed from a bugfix release to the next feature release -- because the fix would break too much code depending on the bug. Some years ago there was a proposal that we should deliberately tweak hash() to break 'buggy' code that depended on it not changing. This never happened. So it has been left de facto constant, to the extent it is, for some years. -- Terry Jan Reedy

On Fri, Jan 20, 2012 at 2:11 PM, Terry Reedy <tjreedy@udel.edu> wrote:
AFAICT Brett's suggestion (which had occurred to me as well, but I'm not a core developer by any stretch) seemed to get lost in the debate: would it be possible to go with collision counting for bugfix releases and hash randomization for new feature releases? (Brett made it here: <http://mail.python.org/pipermail/python-dev/2012-January/115740.html>.) -- Ben Wolfson "Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry]

I believe that either solution has the potential to break existing applications so to ensure that no applications are broken there will need to be a method of disabling it. I also believe that to maintain the backwards compatibility that Python has traditionally had in bug fix releases that either solution will need to default to off. Given those 2 things that I believe, I don't think that the argument should be which solution will break less, because I believe both will need to be off by default, but which solution more adequately solves the underlying problem. On Friday, January 20, 2012 at 5:11 PM, Terry Reedy wrote:

On 20 Jan 2012, at 10:49, Brett Cannon wrote:
I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash: * this would solve the security issue that someone mentioned about being able to deduce the hash because if they keep being mean it'll change anyway * for bugfix, start off without randomization (seed==0) and start to use it only when the collision count hits the threshold * for release, reseeding when you hit a certain threshold still seems like a good idea as it will make lookups/insertions better in the long-run AFAIUI, Python already doesnt guarantee order stability when you insert something into a dictionary, as in the worst case the dictionary has to resize its hash table, and then the order is freshly jumbled again. Just my two cents. Jared

On Sat, Jan 21, 2012 at 4:19 PM, Jared Grubb <jared.grubb@gmail.com> wrote:
I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash
This is nonsense. You have to determine the random seed at startup, and it has to be uniform for the entire life of the process. You can't change it after Python has started. -Paul

You should read the several other threads, the bug, as well as the implementation and patch under discussion. Briefly, Python string hashes are calculated once per string, and then used in many places. You can't change the hash value for a string during program execution without breaking everything. The proposed change modifies the starting value of the hash function to include a process-wide randomly generated seed. This seed is chosen randomly at runtime, but cannot change once chosen. Using the seed changes the final output of the hash to be unpredictable to an attacker, solving the underlying problem. Salt could also be an appropriate term here, but since salt is generally changed on a per-use basis (a single process may use many different salts), seed is more correct, since this value is only chosen once per process. -Paul

This seed is chosen randomly at runtime, but cannot change once chosen.
The hash is used to compare objects: if hash(obj1) != hash(obj2), objects are considered different. So two strings must have the same hash if their value is the same.
We may use a different salt per dictionary. Victor

On Sun, Jan 22, 2012 at 11:11, Victor Stinner <victor.stinner@haypocalc.com> wrote:
Can we do that? I was thinking of ways to not raise errors when we get over a collision count, but instead somehow change the way the dictionary behaves when we get over the collision count, but I couldn't come up with something. Somehow adding a salt would be one possibility. But I don't see how it's doable except for the string-keys only case mentioned before. But I might just be lacking imagination. :-) //Lennart

On 23 January 2012 16:49, Lennart Regebro <regebro@gmail.com> wrote:
Actually, this looks like it has the seed of a solution in it. I haven't scrutinised the following beyond "it sounds like it could work" - it could well contain nasty flaws. Assumption: We only get an excessive number of collisions during an attack (directly or indirectly). Assumption: Introducing a salt into hashes will change those hashes sufficiently to mitigate the attack (all discussion of randomising hashes makes this assumption). 1. Keep the current hashing (for all dictionaries) i.e. just using hash(key). 2. Count collisions. 3. If any key hits X collisions change that dictionary to use a random salt for hashes (at least for str and unicode keys). This salt would be remembered for the dictionary. Consequence: The dictionary would need to be rebuilt when an attack was detected. Consequence: Hash caching would no longer occur for this dictionary, making most operations more expensive. Consequence: Anything relying on the iteration order of a dictionary which has suffered excessive conflicts would fail. 4. (Optional) in 3.3, provide a way to get a dictionary with random salt (i.e. not wait for attack). Tim Delaney

Hello, I'd still prefer to see a randomized hash()-function (at least for 3.3). But to protect against the attacks it would be sufficient to use randomization for collision resolution in dicts (and sets). What if we use a second (randomized) hash-function in case there are many collisions in ONE lookup. This hash-function is used only for collision resolution and is not cached. The benefits: * protection against the known attacks * hash(X) stays stable and the same * dict order is only changed when there are many collisions * doctests will not break * enhanced collision resolution * RNG doesn't have to be initialized in smaller programs * nearly no slowdown of most dicts * second hash-function is only used for keys with higher collision-rate * lower probability to leak secrets * possibility to use different secrets for each dict The drawback: * need to add a second hash-function * slower than using one hash-function only, when > 20 collisions * need to add this to container-types? (if used for py3.3) * need to expose this to the user? (if used for py3.3) * works only for datatypes with this new function * possible to implement without breaking ABI? The following code is meant for explanation purpose only: for(perturb = hash; ; perturb >>= 5) { i = (i << 2) + i + perturb + 1; if((collisions++) == 20) { // perturb is already zero after 13 rounds. // 20 collisions are rare. // you can add && (ma_mask > 256) to make 100% sure // that it's not used for smaller dicts. if(Py_TYPE(key)->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH) { // If type has a randomized hash, use this now for lookup i = perturb = PyObject_RandomizedHash(key)); } ..... If I got this right we could add a new function "tp_randomized_hash" to 3.3 release. But can we also add functions in older releases, without breaking ABI? If not, can we implement this somehow using a flag? FOR OLDER RELEASE < 3.3: Py_hash_t PyObject_RandomizedHash(PyVarObject *o) { PyTypeObject *tp = Py_TYPE(v); if(! (tp->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH)) return -1; global_flags_somewhere->USE_RANDOMIZED_HASH = 1; return (*tp->tp_hash)(v); } .... and in unicodeobject.c: (and wherever we need randomization) static Py_hash_t unicode_hash(PyUnicodeObject *self) { Py_ssize_t len; Py_UNICODE *p; Py_hash_t x; Py_hash_t prefix=0; Py_hash_t suffix=0; if(global_flags_somewhere->USE_RANDOMIZED_HASH) { global_flags_somewhere->USE_RANDOMIZED_HASH = 0; initialize_rng_if_not_already_done_and_return_seed(&prefix, &suffix); ..... (and don't cache in this case) ..... It's ugly, but if I understand this correctly, the GIL will protect us against race-conditions, right? Hello, internals experts: Would this work or is there a better way to do this without breaking the ABI? Frank

On 1/23/2012 12:53 AM, Frank Sievertsen wrote:
So this sounds like SafeDict, but putting it under the covers and automatically converting from dict to SafeDict after a collision threshold has been reached. Let's call it fallback-dict. Compared to SafeDict as a programmer tool, fallback-dict has these benefits: * No need to change program (or library) source to respond to an attack * Order is preserved until the collision threshold has been reached * Performance is preserved until the collision threshold has been reached and costs: * converting the dict from one hash to the other by rehashing all the keys. Compared to always randomizing the hash, fallback-dict has these benefits: * hash (and perfomance) is deterministic: programs running on the same data set will have the same performance characteristic, unless the collision threshold is reached for that data set. * lower probability to leak secrets, because each attacked set/dict can have its own secret, randomized hash seed * patch would not need to include RNG initialization during startup, lowering the impact on short-running programs. What is not clear is how much SafeDict degrades performance when it is used; non-cached hashes will definitely have an impact. I'm not sure whether an implementation of fallback-dict in C code, would be significantly faster than the implementation of SafeDict in Python, to know whether comparing the performance of SafeDict and dict would be representative of the two stages of fallback-dict performance, but certainly the performance cost of SafeDict would be an upper bound on the performance cost of fallback-dict, once conversion takes place, but would not measure the conversion cost. The performance of fallback-dict does have to be significantly better than the performance of dict with collisions to be beneficial, but if the conversion cost is significant, triggering conversions could be an attack vector.

On 23.01.2012 19:25, Glenn Linderman wrote:
That's not exactly what it does, it calls the randomized hash-function only for those keys, that that didn't find a free slot after 20 collision. And it uses this value only for the further collision resolution. So the value of hash() is used for the first 20 slots, randomized_hash() is used after that. 1st try: slot[i = perturb = hash]; 2nd try: slot[i=i * 5 + 1 + (perturb >>= 5)] 3rd try: slot[i=i * 5 + 1 + (perturb >>= 5)] .... 20th try: slot[i= i * 5 + 1 + (perturb >>= 5)] 21th try: slot[i= perturb = randomized_hash(key)] <---- HERE 22th try: slot[i= i * 5 + 1 + (perturb >>= 5)] This is also why there is no conversion needed. It's a per-key/per-lookup rule. Frank

On 1/23/2012 10:58 AM, Frank Sievertsen wrote:
Interesting idea, and I see it would avoid conversions. What happens if the data area also removed from the hash? So you enter 20 colliding keys, then 20 more that get randomized, then delete the 18 of the first 20. Can you still find the second 20 keys? Takes two sets of probes, somehow?

Frank Sievertsen wrote:
This sounds a lot like what I'm referring to as universal hash function in the discussion on the ticket: http://bugs.python.org/issue13703#msg150724 http://bugs.python.org/issue13703#msg150795 http://bugs.python.org/issue13703#msg151813 However, I don't like the term "random" in there. It's better to make the approach deterministic to avoid issues with not being able to easily reproduce Python application runs for debugging purposes. If you find that the data is manipulated, simply incrementing the universal hash parameter and rehashing the dict with that parameter should be enough to solve the issue (if not, which is highly unlikely, the dict will simply reapply the fix). No randomness needed. BTW: I attached a demo script to the ticket which demonstrates both types of collisions using integers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 23 2012)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Am 20.01.2012 19:15, schrieb Guido van Rossum:
I agree. This applies to 3.2 and 2.7, but even more to 3.1 and 2.6, which are in security-fix mode. Even if relying on dict order is a bug right now, I believe it happens many times more often in code bases out there than dicts that are filled with many many colliding keys. So even if we can honestly blame the programmer in the former case, the users applying the security fix will have the same bad experience and won't likely care if we claim "undefined behavior". This means that it seems preferable to go with the situation where you have less breakages in total. Not to mention that changing dict order is likely to lead to much more subtle bugs than a straight MemoryError on a dict insert. Also, another advantage of collision counting I haven't seen in the discussion yet is that it's quite trivial to provide an API in sys to turn it on or off -- while turning on or off randomized hashes has to be done before Python starts up, i.e. at build time or with an environment variable or flag. Georg

On Fri, Jan 20, 2012 at 1:34 AM, "Martin v. Löwis" <martin@v.loewis.de>wrote:
It would be a pretty lousy app that tried to load the contents of an entire database into a dict. It seems that this would require much more knowledge of what the app is trying to do before a successful attack can be mounted. So I don't think this is worse than the original attack -- I think it requires much more ingenuity of an attacker. (I'm thinking that the original attack is trivial once the set of 65000 colliding keys is public knowledge, which must be only a matter of time.) -- --Guido van Rossum (python.org/~guido)

Hello, I still see at least two ways to create a DOS attack even with the collison-counting-patch. I assumed that it's possible to send ~500KB of payload to the application. 1. It's fully deterministic which slots the dict will lookup. Since we don't count slot-collisions, but only hash-value-collisions this can be exploited easily by creating strings with the hash-values along the lookup-way of an arbitrary (short) string. So first we pick an arbitrary string. Then calculate which slots it will visit on the way to the first empty slot. Then we create strings with hash-values for these slots. This attack first injects the strings to fill all the slots that the one short string will want to visit. Then it adds THE SAME string again and again. Since the entry is already there, nothing will be added and no additional collisions happen, no exception raised. $ ls -l super.txt -rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt $ tail -n3 super.txt FX5 FX5 FX5 $ wc -l super.txt 90000 super.txt $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("super.txt"))' real 0m52.724s user 0m51.543s sys 0m0.028s 2. The second attack actually attacks that 1000 allowed string comparisons are still a lot of work. First I added 999 strings that collide with a one-byte string "a". In some applications a zero-byte string might work even better. Then I can add a many thousand of the "a"'s, just like the first attack. $ ls -l 1000.txt -rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt $ head -n 3 1000.txt 7hLci00 4wVFm10 _rZJU50 $ wc -l 1000.txt 247000 1000.txt $ tail -n 3 1000.txt a a a $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("1000.txt"))' real 0m17.408s user 0m15.897s sys 0m0.008s Of course the first attack is far more efficient. One could argue that 16 seconds is not enough for an attack. But maybe it's possible to send 1MB, have zero-bytes strings, and since for example django does 5 lookups per query-string this will keep it busy for ~80 seconds on my pc. What to do now? I think it's not smart to reduce the number of allowed collisions dramatically AND count all slot-collisions at the same time. Frank

On 1/20/2012 10:55 AM, Frank Sievertsen wrote:
If 1000 were replaced by, for instance, random.randint(700,1000) the dict could not be set to have an exception triggered with one other entry (which I believe was Martin's idea). But I suppose you would say that 699 entries would still make for much work. The obvious defense for this particular attack is to reject duplicate keys. Perhaps there should be write-once string sets and dicts available. This gets to the point that there is no best blind defense to all possible attacks. -- Terry Jan Reedy

On Fri, 2012-01-20 at 16:55 +0100, Frank Sievertsen wrote:
[snip description of two types of attack on the collision counting approach]
Frank: did you see the new approach I proposed in: http://bugs.python.org/issue13703#msg151735 http://bugs.python.org/file24289/amortized-probe-counting-dmalcolm-2012-01-2... (repurposes the ma_smalltable region of large dictionaries to add tracking of each such dict's average iterations taken per modification, and raise an exception when it exceeds a particular ratio) I'm interested in hearing how it holds up against your various test cases, or what flaws there are in it. Thanks! Dave

Am 20.01.2012 16:33, schrieb Guido van Rossum:
I think it's very likely that this will happen soon. For ASP and PHP there is attack-payload publicly available. PHP and ASP have patches to limit the number of query-variables. We're very lucky that there's no public payload for python yet, and all non-public software and payload I'm aware of is based upon my software. But this can change any moment. It's not really difficult to write software to create 32bit-collisions. Frank

On Fri, Jan 20, 2012 at 2:35 PM, Frank Sievertsen <pydev@sievertsen.de> wrote:
While we're debating the best fix, could we allow people to at least protect themselves against script-kiddies by offering fixes to cgi.py, django, webob and a few other popular frameworks that limits forms to 1000 keys? (I suppose it's really only POST requests that are vulnerable to script kiddies, because of the length restriction on URLs.) -- --Guido van Rossum (python.org/~guido)

Victor Stinner <victor.stinner@haypocalc.com> wrote:
I propose to solve the hash collision vulnerability by counting collisions [...]
For web frameworks, forcing an exception is less harmful than forcing a many-second delay, but I think it's hard to be confident that there aren't other vulnerable applications where it's the other way round. Web frameworks like the exception because they already have backstop exception handlers, and anyway they use short-lived processes and keep valuable data in databases rather than process memory. Web frameworks don't like the delay because they allow unauthenticated users to submit many requests (including multiple requests in parallel), and they normally expect each response to take little cpu time. But many programs are not like this. What about a log analyser or a mailing list archiver or a web crawler or a game server or some other kind of program we haven't considered? -M-

On Sat, Jan 21, 2012 at 7:50 AM, Matthew Woodcraft <matthew@woodcraft.me.uk> wrote:
If my log crawler ended up taking minutes per log entry instead of milliseconds I'd have to kill it anyway. Web crawlers are huge multi-process systems that are as robust as web servers, or more. Game servers are just web apps. -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 20, 2012 at 00:48, Victor Stinner <victor.stinner@haypocalc.com> wrote:
I'd like to point out that an attacker is not limited to sending just one dict full of colliding keys. Given a 22ms stall for a dict full of 1000 colliding keys, and 100 such objects inside a parent object (perhaps JSON), you can stall a server for 2.2+ seconds. Going with the raise-at-1000 approach doesn't solve the problem for everyone. In addition, because the raise-at-N-collisions approach raises an exception, everyone who wants to handle this error condition properly has to change their code to catch a previously-unexpected exception. (I know they're usually still better off with the fix, but why force many people to change code when you can actually fix the hashing problem?) Another issue is that even with a configurable limit, different modules can't have their own limits. One module might want a relatively safe raise-at-100, and another module creating massive dicts might want raise-at-1000. How does a developer know whether they can raise or lower the limit, given that they use a bunch of different modules? I actually went with this stop-at-N-collisions approach by patching my CPython a few years ago, where I limiting dictobject and setobject's critical `for` loop to 100 iterations (I realize this might handle fewer than 100 collisions.) This worked fine until I tried to compile PyPy, where the translator blew up due to a massive dict. This, combined with the second problem (needing to catch an exception), led me to abandon this approach and write Securetypes, which has a securedict that uses SHA-1. Not that I like this either; I think I'm happy with the randomize-hash() approach. Ivan

On Thu, Jan 19, 2012 at 7:32 PM, Ivan Kozik <ivan@ludios.org> wrote:
It's "just" a DoS attack. Those won't go away. We just need to raise the effort needed for the attacker. The original attack would cause something like 5 minutes of CPU usage per request (with a set of colliding keys that could be computed once and used to attack every Python-run website in the world). That's at least 2 orders of magnitude worse. In addition, because the raise-at-N-collisions approach raises an
Why would anybody need to change their code? Every web framework worth its salt has a top-level error catcher that logs the error, serves a 500 response, and possibly does other things like email the admin.
I don't think it needs to be configurable. There just needs to be a way to turn it off.
I think that's because your collision-counting algorithm was much more primitive than MAL's.
Why did you need to catch the exception? Were you not happy with the program simply terminating with a traceback when it got attacked? -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 20, 2012 at 03:48, Guido van Rossum <guido@python.org> wrote:
I think that's because your collision-counting algorithm was much more primitive than MAL's.
Conceded.
No, I wasn't happy with termination. I wanted to treat it just like a JSON decoding error, and send the appropriate response. I actually forgot to mention the main reason I abandoned the stop-at-N-collisions approach. I had a server with a dict that stayed in memory, across many requests. It was being populated with identifiers chosen by clients. I couldn't have my server stay broken if this dict filled up with a bunch of colliding keys. (I don't think I could have done another thing either, like nuke the dict or evict some keys.) Ivan

On Thu, Jan 19, 2012 at 8:06 PM, Ivan Kozik <ivan@ludios.org> wrote:
No, I wasn't happy with termination. I wanted to treat it just like a JSON decoding error, and send the appropriate response.
So was this attack actually being mounted on your service regularly? I'd think it would be sufficient to treat it as a MemoryError -- unavoidable, if it happens things are really bad, and hopefully you'll crash quickly and some monitor process restarts your service. That's a mechanism that you should have anyway.
What would your service do if it ran out of memory? Maybe one tweak to the collision counting would be that the exception needs to inherit from BaseException (like MemoryError) so most generic exception handlers don't actually handle it. (Style note: never use "except:", always use "except Exception:".) -- --Guido van Rossum (python.org/~guido)

2012/1/20 Ivan Kozik <ivan@ludios.org>:
I'd like to point out that an attacker is not limited to sending just one dict full of colliding keys. Given a 22ms stall ...
The presented attack produces a stall of at least 30 seconds (5 minutes or more if there is no time limit in the application), not 0.022 second. You have to send a lot of requests to produce a DoS if a single requests just eat 22 ms. I suppose that there are a lot of other kinds of request than takes much longer than 22 ms, even valid requests.
Python becomes really slow when you have more than N collisions (O(n^2) problem). If an application hits this limit with valid data, it is time to use another data structure or use a different hash function. We have to do more tests to choose correctly N, but according my first tests, it looks like N=1000 is a safe limit. Marc Andre's patch doesn't count all "collisions", but only "collisions" requiring to compare objects. When two objects have the same hash value, the open addressing algorithm searchs a free bucket. If a bucket is not free but has a different hash value, the objects are not compared and the collision counter is not incremented. The limit is only reached when you have N objects having the same hash value modulo the size of the bucket (hash(str) & DICT_MASK). When there are not enough empty buckets (it comes before all buckets are full), Python resizes the dictionary (it does something like size = size * 2) and so it uses at least one more bit each time than the dictionary is resized. Collisions are very likely with a small dictioanry, but becomes more rare each time than the dictionary is resized. It means that the number of potential collisions (with valid data) decreases when the dictionary grows. Tell me if I am wrong.

Victor Stinner wrote:
You might think that 10 million keys is a lot of data, but that's only about 100 MB worth. I already see hardware vendors advertising computers with 6 GB RAM as "entry level", e.g. the HP Pavilion starts with 6GB expandable to 16GB. I expect that there are already people using Python who will unpredictably hit that limit by accident, and the number will only grow as computers get more memory. With a limit of 35 collisions, it only takes 35 keys to to force a dict to raise an exception, if you are an attacker able to select colliding keys. We're trying to defend against an attacker who is able to force collisions, not one who is waiting for accidental collisions. I don't see that causing the dict to raise an exception helps matters: it just changes the attack from "keep the dict busy indefinitely" to "cause an exception and crash the application". This moves responsibility from dealing with collisions out of the dict to the application code. Instead of solving the problem in one place (the built-in dict) now every application that uses dicts has to identify which dicts can be attacked, and deal with the exception. That pushes the responsibility for security onto people who are the least willing or able to deal with it: the average developer, who neither understands nor cares about security, or if they do care, they can't convince their manager to care. I suppose an exception is an improvement over the application hanging indefinitely, but I'd hardly call it a fix. Ruby uses randomized hashes. Are there any other languages with a dict or mapping class that raises on too many exceptions? -- Steven

On Fri, Jan 20, 2012 at 2:00 PM, Steven D'Aprano <steve@pearwood.info> wrote:
No, that's fundamentally misunderstanding the nature of the attack. The reason the hash collision attack is a problem is because it allows you to DoS a web service in a way that requires minimal client side resources but can have a massive effect on the server. The attacker is making a single request that takes the server an inordinately long time to process, consuming CPU resources all the while, and likely preventing the handling of any other requests (especially for an event-based server, since the attack is CPU based, bypassing all use of asynchronous IO). With the 1000 collision limit in place, the attacker sends their massive request, the affected dict quickly hits the limit, throws an unhandled exception which is then caught by the web framework and turned into a 500 Error response (or whatever's appropriate for the protocol being attacked). If a given web service doesn't *already* have a catch all handler to keep an unexpected exception from bringing the entire service down, then DoS attacks like this one are the least of its worries. As for why other languages haven't gone this way, I have no idea. There are lots of details relating to a language's hash and hash map design that will drive how suitable randomisation is as an answer, and it also depends greatly on how you decide to characterise the threat. FWIW, Victor's analysis in the opening post of this thread matches the conclusions I came to a few days ago, although he's been over the alternatives far more thoroughly than I have. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Victor, On 01/19/2012 05:48 PM, Victor Stinner wrote: [snip]
I'm a Django core developer, and if it is true that our test-suite has a dictionary-ordering dependency that is expressed via HTML attribute ordering, I consider that a bug and would like to fix it. I'd be grateful for, not resentful of, a change in CPython that revealed the bug and prompted us to fix it. (I presume that it is true, as it sounds like you experienced it directly; I don't have time to play around at the moment, but I'm surprised we haven't seen bug reports about it from users of 64-bit Pythons long ago). I can't speak for the core team, but I doubt there would be much disagreement on this point: ideally Django would run equally well on any implementation of Python, and as far as I know none of the alternative implementations guarantee hash or dict-ordering compatibility with CPython. I don't have the expertise to speak otherwise to the alternatives for fixing the collisions vulnerability, but I don't believe it's accurate to presume that Django would not want to fix a dict-ordering dependency, and use that as a justification for one approach over another. Carl -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8Y83oACgkQ8W4rlRKtE2cNawCg5q/p1+OOKFYDymDJGoClBBlg WNAAn3xevD+0CqAQ+mFNHCBhtLgw8IYv =HDOh -----END PGP SIGNATURE-----

On Fri, Jan 20, 2012 at 2:54 PM, Carl Meyer <carl@oddbird.net> wrote:
It's more a matter of wanting deployment of a security fix to be as painless as possible - a security fix that system administrators can't deploy because it breaks critical applications may as well not exist. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jan 20, 2012, at 03:18 PM, Nick Coghlan wrote:
True, but collision counting is worse IMO. It's just as likely (maybe) that an application would start getting new exceptions on dictionary insertion, as they would failures due to dictionary order changes. Unfortunately, in the former case it's because Python just added a new public API in a security release (the new exception *is* public API). In the latter case, no new API was added, but something exposed an already existing bug in the application. That's still a bug in the application even if counting was added. It's also a bug that any number of changes in the environment, or OS vendor deployment, could have triggered. -1 for collision counting. -Barry

On 1/19/2012 8:54 PM, Carl Meyer wrote:
It might be a good idea to have a way to seed the hash with some value to allow testing with different dict orderings -- this would allow tests to be developed using one Python implementation that would be immune to the different orderings on different implementations; however, randomizing the hash not only doesn't solve the problem for long-running applications, it causes non-deterministic performance from one run to the next even with the exact same data: a different (random) seed could cause collisions sporadically with data that usually gave good performance results, and there would be little explanation for it, and little way to reproduce the problem to report it or understand it.

I'm surprised we haven't seen bug reports about it from users of 64-bit Pythons long ago
A Python dictionary only uses the lower bits of a hash value. If your dictionary has less than 2**32 items, the dictionary order is exactly the same on 32 and 64 bits system: hash32(str) & mask == hash64(str) & mask for mask <= 2**32-1.

In http://mail.python.org/pipermail/python-dev/2012-January/115715.html Frank Sievertsen wrote: Am 20.01.2012 13:08, schrieb Victor Stinner:
No, that's not true. Whenever a collision happens, other bits are mixed in very fast.
Frank
Bits are mixed in quickly from a denial-of-service standpoint, but Victor is correct from a "Why don't the tests already fail?" standpoint. A dict with 2**12 slots, holding over 2700 entries, will be far larger than most test cases -- particularly those with visible output. In a dict that size, 32-bit and 64-bit machines will still probe the same first, second, third, fourth, fifth, and sixth slots. Even on the rare cases when there are at least 6 collisions, the next slots may well be either the same, or close enough that it doesn't show up in a changed iteration order. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ

The main issue with that approach is that it allows a new kind of attack. An attacker now needs to find 1000 colliding keys, and submit them one-by-one into a database. The limit will not trigger, as those are just database insertions. Now, if the applications also as a need to read the entire database table into a dictionary, that will suddenly break, and not for the attacker (which would be ok), but for the regular user of the application or the site administrator. So it may be that this approach actually simplifies the attack, making the cure worse than the disease. Regards, Martin

The main issue with that approach is that it allows a new kind of attack.
Indeed, I posted another example: http://bugs.python.org/msg151677 This kind of fix can be used in a specific application or maybe in a special-purpose framework, but not on the level of a general-purpose language. Frank

On Fri, Jan 20, 2012 at 1:57 AM, Frank Sievertsen <pydev@sievertsen.de>wrote:
Right. We are discussion this issue (for weeks now...) because it makes pretty much any Python app that takes untrusted data vulnerable, especially web apps, and after extensive analysis we came to the conclusion that defenses in the framework or in the app are really hard to do, very disruptive for developers, whereas preventing the attack by a modification of the dict or hash algorithms would fix it for everybody. And moreover, the attack would work against pretty much any Python web app using a set of evil strings computed once (hence encouraging script kiddies of just firing their fully-automatic weapon at random websites). The new attacks that are now being considered require analysis of how the website is implemented, how it uses and stores data, etc. So an attacker has to sit down and come up with an attack tailored to a specific website. That can be dealt with on an ad-hoc basis. -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 20, 2012 at 7:34 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Ouch, I think you're right. So hash randomisation may be the best option, and admins will need to test for themselves to see if it breaks things... Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Oh, good catch. But it would not call it a new kind of attack, it is just a particular case of the hash collision vulnerability. Counting collision doesn't solve this case, but it doesn't make the situation worse than before. Raising quickly an exception is better than stalling for minutes, even if I agree than it is not the best behaviour. The best would is to answer quickly with the expected result :-) (using a different data structure or a different hash function?) Right now, I don't see any counter measure against this case. Victor

On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:
ISTM that adding the possibility of raising a new exception on dictionary insertion is *more* backward incompatible than changing dictionary order, which for a very long time has been known to not be guaranteed. You're running some application, you upgrade Python because you apply all security fixes, and suddenly you're starting to get exceptions in places you can't really do anything about. Yet those exceptions are now part of the documented public API for dictionaries. This is asking for trouble. Bugs will suddenly start appearing in that application's tracker and they will seem to the application developer like Python just added a new public API in a security release. OTOH, if you change dictionary order and *that* breaks the application, then the bugs submitted to the application's tracker will be legitimate bugs that have to be fixed even if nothing else changed. So I still think we should ditch the paranoia about dictionary order changing, and fix this without counting. A little bit of paranoia could creep back in by disabling the hash fix by default in stable releases, but I think it would be fine to make that a compile-time option. -Barry

So I still think we should ditch the paranoia about dictionary order changing, and fix this without counting.
The randomized hash has other issues: - its security is based on its secret, whereas it looks to be easy to compute it (see more details in the issue) - my patch only changes hash(str), whereas other developers asked me to patch also bytes, int and other types hash(bytes) can be changed. But changing hash(int) may leak easily the secret. We may use a different secret for each type, but if it is easy to compute int hash secret, dictionaries using int are still vulnerable. -- There is no perfect solutions, drawbacks of each solution should be compared. Victor

On Fri, 20 Jan 2012 17:17:24 +0100 Victor Stinner <victor.stinner@haypocalc.com> wrote:
How do you compute the secret? I see two possibilities: - the application leaks the hash() values: this sounds unlikely since I don't see the use case for it; - the application shows the dict iteration order (e.g. order of HTML attributes): then we could add a second per-dictionary secret so that the iteration order of a single dict doesn't give any useful information about the hash function. But the bottom line for me is the following: - randomized hashes eliminate the possibility to use a single exploit for all Python-powered applications: for each application, the attacker now has to find a way to extract the secret; - collision counting doesn't eliminate the possibility of generic exploits, as Frank Sievertsen has just shown in http://mail.python.org/pipermail/python-dev/2012-January/115726.html Regards Antoine.

On 1/20/2012 11:17 AM, Victor Stinner wrote:
There is no perfect solutions, drawbacks of each solution should be compared.
Amen. One possible attack that has been described for a collision counting dict depends on knowing precisely the trigger point. So let MAXCOLLISIONS either be configureable or just choose a random count between M and N, say 700 and 999. It would not hurt to have alternate patches available in case a particular Python-powered site comes under prolonged attack. Though given our miniscule share of the market, than is much less likely that an attack on a PHP- or MS-powered site. -- Terry Jan Reedy

On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw <barry@python.org> wrote:
Dict insertion can already raise an exception: MemoryError. I think we should be safe if the new exception also derives from BaseException. We should actually eriously consider just raising MemoryException, since introducing a new built-in exception in a bugfix release is also very questionable: code explicitly catching or raising it would not work on previous bugfix releases of the same feature release. OTOH, if you change dictionary order and *that* breaks the application, then
There are lots of things that are undefined according to the language spec (and quite possibly known to vary between versions or platforms or implementations like PyPy or Jython) but which we would never change in a bugfix release. So I still think we should ditch the paranoia about dictionary order
I'm sorry, but I don't want to break a user's app with a bugfix release and say "haha your code was already broken you just didn't know it". Sure, the dict order already varies across Python implementations, possibly across 32/64 bits or operating systems. But many organizations (I know a few :-) have a very large installed software base, created over many years by many people with varying skills, that is kept working in part by very carefully keeping the environment as constant as possible. This means that the target environment is much more predictable than it is for the typical piece of open source software. Sure, a good Python developer doesn't write apps or tests that depend on dict order. But time and again we see that not everybody writes perfect code every time. Especially users writing "in-house" apps (as opposed to frameworks shared as open source) are less likely to always use the most robust, portable algorithms in existence, because they may know with much more certainty that their code will never be used on certain combinations of platforms. For example, I rarely think about whether code I write might not work on IronPython or Jython, or even CPython on Windows. And if something I wrote suddenly needs to be ported to one of those, well, that's considered a port and I'll just accept that it might mean changing a few things. The time to break a dependency on dict order is not with a bugfix release but with a feature release: those are more likely to break other things as well anyway, and uses are well aware that they have to test everything and anticipate having to fix some fraction of their code for each feature release. OTOH we have established a long and successful track record of conservative bugfix releases that don't break anything. (I am aware of exactly one thing that was broken by a bugfix release in application code I am familiar with.) -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 20, 2012 at 13:15, Guido van Rossum <guido@python.org> wrote:
Why can't we have our cake and eat it too? Can we do hash randomization in 3.3 and use the hash count solution for bugfix releases? That way we get a basic fix into the bugfix releases that won't break people's code (hopefully) but we go with a more thorough (and IMO correct) solution of hash randomization starting with 3.3 and moving forward. We aren't breaking compatibility in any way by doing this since it's a feature release anyway where we change tactics. And it can't be that much work since we seem to have patches for both solutions. At worst it will make merging commits for those files affected by the patches, but that will most likely be isolated and not a common collision (and less of any issue once 3.3 is released later this year). I understand the desire to keep backwards-compatibility, but collision counting could cause an error in some random input that someone didn't expect to cause issues whether they were under a DoS attack or just had some unfortunate input from private data. The hash randomization, though, is only weak if someone is attacked, not if they are just using Python with their own private data.

Even if a MemoryException is raised I believe that is still a fundamental change in the documented contract of dictionary API. I don't believe there is a way to fix this without breaking someones application. The major differences I see between the two solutions is that counting will break people's applications who are otherwise following the documented api contract of dictionaries, and randomization will break people's applications who are violating the documented api contract of dictionaries. Personally I feel that the lesser of two evils is to reward those who followed the documentation, and not reward those who didn't. So +1 for Randomization as the only option in 3.3, and off by default with a flag or environment variable in bug fixes. I think it's the only way to proceed that won't hurt people who have followed the documented behavior. On Friday, January 20, 2012 at 1:49 PM, Brett Cannon wrote:

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/20/2012 02:04 PM, Donald Stufft wrote:
Even if a MemoryException is raised I believe that is still a fundamental change in the documented contract of dictionary API.
How so? Dictionary inserts can *already* raise that error.
Do you have a case in mind where legitimate user data (not crafted as part of a DoS attack) would trip the 1000-collision limit? How likely is it that such cases exist in already-deployed applications, compared to the known breakage in existing applications due to hash randomization?
Except that I think your set is purely hypothetical, while the second set is *lots* of deployed applications. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ZwlgACgkQ+gerLs4ltQ4KOACglAHDgn5wUb+cye99JbeW0rZo 5oAAn2ja7K4moFLN/aD4ZP7m+8WnwhcA =u7Mt -----END PGP SIGNATURE-----

On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:
I don't, but as there's never been a limit on how many collisions a dictionary can have, this would be a fundamental change in the documented (and undocumented) abilities of a dictionary. Dictionary key order has never been guaranteed, is documented to not be relied on, already changes depending on if you are using 32bit, 64bit, Jython, PyPy etc or as someone else pointed out, to any number of possible improvements to dict. The counting solution violates the existing contract in order to serve people who themselves are violating the contract. Even with their violation the method that I +1'd still serves to not break existing applications by default.
Which is why I believe that it should be off by default on the bugfix, but easily enabled. (Flag, env var, whatever). That allows people to upgrade to a bugfix without breaking their application, and if this vulnerability affects them, they can enable it. I think the counting collision is at best a bandaid and not a proper fix stemmed from a desire to not break existing applications on a bugfix release which can be better solved by implementing the real fix and allowing people to control (only on the bugfix, on 3.3+ it should be forced to on always) if they have it enabled or not.

Guido van Rossum wrote:
Whether you have run out of total memory, or a single contiguous block, it is still a memory error. An arbitrary limit on collisions is not a memory error. If we were designing this API from scratch, would anyone propose using MemoryError for "you have reached a limit on collisions"? It has nothing to do with memory. Using MemoryError for something which isn't a memory error is ugly. How about RuntimeError? -- Steven

On Fri, Jan 20, 2012 at 6:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Guido van Rossum wrote:
It should derive from BaseException.
Sorry, I was ambiguous. I meant it should not derive from Exception. It goes RuntimeError -> StandardError -> Exception -> BaseException. -- --Guido van Rossum (python.org/~guido)

On 1/20/2012 2:51 PM, Donald Stufft wrote:
My opinion of counting is better than yours, but even conceding the theoretical, purity argument, our release process is practical as well. There have been a few occasions when fixes to bugs in our code have been delayed from a bugfix release to the next feature release -- because the fix would break too much code depending on the bug. Some years ago there was a proposal that we should deliberately tweak hash() to break 'buggy' code that depended on it not changing. This never happened. So it has been left de facto constant, to the extent it is, for some years. -- Terry Jan Reedy

On Fri, Jan 20, 2012 at 2:11 PM, Terry Reedy <tjreedy@udel.edu> wrote:
AFAICT Brett's suggestion (which had occurred to me as well, but I'm not a core developer by any stretch) seemed to get lost in the debate: would it be possible to go with collision counting for bugfix releases and hash randomization for new feature releases? (Brett made it here: <http://mail.python.org/pipermail/python-dev/2012-January/115740.html>.) -- Ben Wolfson "Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry]

I believe that either solution has the potential to break existing applications so to ensure that no applications are broken there will need to be a method of disabling it. I also believe that to maintain the backwards compatibility that Python has traditionally had in bug fix releases that either solution will need to default to off. Given those 2 things that I believe, I don't think that the argument should be which solution will break less, because I believe both will need to be off by default, but which solution more adequately solves the underlying problem. On Friday, January 20, 2012 at 5:11 PM, Terry Reedy wrote:

On 20 Jan 2012, at 10:49, Brett Cannon wrote:
I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash: * this would solve the security issue that someone mentioned about being able to deduce the hash because if they keep being mean it'll change anyway * for bugfix, start off without randomization (seed==0) and start to use it only when the collision count hits the threshold * for release, reseeding when you hit a certain threshold still seems like a good idea as it will make lookups/insertions better in the long-run AFAIUI, Python already doesnt guarantee order stability when you insert something into a dictionary, as in the worst case the dictionary has to resize its hash table, and then the order is freshly jumbled again. Just my two cents. Jared

On Sat, Jan 21, 2012 at 4:19 PM, Jared Grubb <jared.grubb@gmail.com> wrote:
I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash
This is nonsense. You have to determine the random seed at startup, and it has to be uniform for the entire life of the process. You can't change it after Python has started. -Paul

You should read the several other threads, the bug, as well as the implementation and patch under discussion. Briefly, Python string hashes are calculated once per string, and then used in many places. You can't change the hash value for a string during program execution without breaking everything. The proposed change modifies the starting value of the hash function to include a process-wide randomly generated seed. This seed is chosen randomly at runtime, but cannot change once chosen. Using the seed changes the final output of the hash to be unpredictable to an attacker, solving the underlying problem. Salt could also be an appropriate term here, but since salt is generally changed on a per-use basis (a single process may use many different salts), seed is more correct, since this value is only chosen once per process. -Paul

This seed is chosen randomly at runtime, but cannot change once chosen.
The hash is used to compare objects: if hash(obj1) != hash(obj2), objects are considered different. So two strings must have the same hash if their value is the same.
We may use a different salt per dictionary. Victor

On Sun, Jan 22, 2012 at 11:11, Victor Stinner <victor.stinner@haypocalc.com> wrote:
Can we do that? I was thinking of ways to not raise errors when we get over a collision count, but instead somehow change the way the dictionary behaves when we get over the collision count, but I couldn't come up with something. Somehow adding a salt would be one possibility. But I don't see how it's doable except for the string-keys only case mentioned before. But I might just be lacking imagination. :-) //Lennart

On 23 January 2012 16:49, Lennart Regebro <regebro@gmail.com> wrote:
Actually, this looks like it has the seed of a solution in it. I haven't scrutinised the following beyond "it sounds like it could work" - it could well contain nasty flaws. Assumption: We only get an excessive number of collisions during an attack (directly or indirectly). Assumption: Introducing a salt into hashes will change those hashes sufficiently to mitigate the attack (all discussion of randomising hashes makes this assumption). 1. Keep the current hashing (for all dictionaries) i.e. just using hash(key). 2. Count collisions. 3. If any key hits X collisions change that dictionary to use a random salt for hashes (at least for str and unicode keys). This salt would be remembered for the dictionary. Consequence: The dictionary would need to be rebuilt when an attack was detected. Consequence: Hash caching would no longer occur for this dictionary, making most operations more expensive. Consequence: Anything relying on the iteration order of a dictionary which has suffered excessive conflicts would fail. 4. (Optional) in 3.3, provide a way to get a dictionary with random salt (i.e. not wait for attack). Tim Delaney

Hello, I'd still prefer to see a randomized hash()-function (at least for 3.3). But to protect against the attacks it would be sufficient to use randomization for collision resolution in dicts (and sets). What if we use a second (randomized) hash-function in case there are many collisions in ONE lookup. This hash-function is used only for collision resolution and is not cached. The benefits: * protection against the known attacks * hash(X) stays stable and the same * dict order is only changed when there are many collisions * doctests will not break * enhanced collision resolution * RNG doesn't have to be initialized in smaller programs * nearly no slowdown of most dicts * second hash-function is only used for keys with higher collision-rate * lower probability to leak secrets * possibility to use different secrets for each dict The drawback: * need to add a second hash-function * slower than using one hash-function only, when > 20 collisions * need to add this to container-types? (if used for py3.3) * need to expose this to the user? (if used for py3.3) * works only for datatypes with this new function * possible to implement without breaking ABI? The following code is meant for explanation purpose only: for(perturb = hash; ; perturb >>= 5) { i = (i << 2) + i + perturb + 1; if((collisions++) == 20) { // perturb is already zero after 13 rounds. // 20 collisions are rare. // you can add && (ma_mask > 256) to make 100% sure // that it's not used for smaller dicts. if(Py_TYPE(key)->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH) { // If type has a randomized hash, use this now for lookup i = perturb = PyObject_RandomizedHash(key)); } ..... If I got this right we could add a new function "tp_randomized_hash" to 3.3 release. But can we also add functions in older releases, without breaking ABI? If not, can we implement this somehow using a flag? FOR OLDER RELEASE < 3.3: Py_hash_t PyObject_RandomizedHash(PyVarObject *o) { PyTypeObject *tp = Py_TYPE(v); if(! (tp->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH)) return -1; global_flags_somewhere->USE_RANDOMIZED_HASH = 1; return (*tp->tp_hash)(v); } .... and in unicodeobject.c: (and wherever we need randomization) static Py_hash_t unicode_hash(PyUnicodeObject *self) { Py_ssize_t len; Py_UNICODE *p; Py_hash_t x; Py_hash_t prefix=0; Py_hash_t suffix=0; if(global_flags_somewhere->USE_RANDOMIZED_HASH) { global_flags_somewhere->USE_RANDOMIZED_HASH = 0; initialize_rng_if_not_already_done_and_return_seed(&prefix, &suffix); ..... (and don't cache in this case) ..... It's ugly, but if I understand this correctly, the GIL will protect us against race-conditions, right? Hello, internals experts: Would this work or is there a better way to do this without breaking the ABI? Frank

On 1/23/2012 12:53 AM, Frank Sievertsen wrote:
So this sounds like SafeDict, but putting it under the covers and automatically converting from dict to SafeDict after a collision threshold has been reached. Let's call it fallback-dict. Compared to SafeDict as a programmer tool, fallback-dict has these benefits: * No need to change program (or library) source to respond to an attack * Order is preserved until the collision threshold has been reached * Performance is preserved until the collision threshold has been reached and costs: * converting the dict from one hash to the other by rehashing all the keys. Compared to always randomizing the hash, fallback-dict has these benefits: * hash (and perfomance) is deterministic: programs running on the same data set will have the same performance characteristic, unless the collision threshold is reached for that data set. * lower probability to leak secrets, because each attacked set/dict can have its own secret, randomized hash seed * patch would not need to include RNG initialization during startup, lowering the impact on short-running programs. What is not clear is how much SafeDict degrades performance when it is used; non-cached hashes will definitely have an impact. I'm not sure whether an implementation of fallback-dict in C code, would be significantly faster than the implementation of SafeDict in Python, to know whether comparing the performance of SafeDict and dict would be representative of the two stages of fallback-dict performance, but certainly the performance cost of SafeDict would be an upper bound on the performance cost of fallback-dict, once conversion takes place, but would not measure the conversion cost. The performance of fallback-dict does have to be significantly better than the performance of dict with collisions to be beneficial, but if the conversion cost is significant, triggering conversions could be an attack vector.

On 23.01.2012 19:25, Glenn Linderman wrote:
That's not exactly what it does, it calls the randomized hash-function only for those keys, that that didn't find a free slot after 20 collision. And it uses this value only for the further collision resolution. So the value of hash() is used for the first 20 slots, randomized_hash() is used after that. 1st try: slot[i = perturb = hash]; 2nd try: slot[i=i * 5 + 1 + (perturb >>= 5)] 3rd try: slot[i=i * 5 + 1 + (perturb >>= 5)] .... 20th try: slot[i= i * 5 + 1 + (perturb >>= 5)] 21th try: slot[i= perturb = randomized_hash(key)] <---- HERE 22th try: slot[i= i * 5 + 1 + (perturb >>= 5)] This is also why there is no conversion needed. It's a per-key/per-lookup rule. Frank

On 1/23/2012 10:58 AM, Frank Sievertsen wrote:
Interesting idea, and I see it would avoid conversions. What happens if the data area also removed from the hash? So you enter 20 colliding keys, then 20 more that get randomized, then delete the 18 of the first 20. Can you still find the second 20 keys? Takes two sets of probes, somehow?

Frank Sievertsen wrote:
This sounds a lot like what I'm referring to as universal hash function in the discussion on the ticket: http://bugs.python.org/issue13703#msg150724 http://bugs.python.org/issue13703#msg150795 http://bugs.python.org/issue13703#msg151813 However, I don't like the term "random" in there. It's better to make the approach deterministic to avoid issues with not being able to easily reproduce Python application runs for debugging purposes. If you find that the data is manipulated, simply incrementing the universal hash parameter and rehashing the dict with that parameter should be enough to solve the issue (if not, which is highly unlikely, the dict will simply reapply the fix). No randomness needed. BTW: I attached a demo script to the ticket which demonstrates both types of collisions using integers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 23 2012)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Am 20.01.2012 19:15, schrieb Guido van Rossum:
I agree. This applies to 3.2 and 2.7, but even more to 3.1 and 2.6, which are in security-fix mode. Even if relying on dict order is a bug right now, I believe it happens many times more often in code bases out there than dicts that are filled with many many colliding keys. So even if we can honestly blame the programmer in the former case, the users applying the security fix will have the same bad experience and won't likely care if we claim "undefined behavior". This means that it seems preferable to go with the situation where you have less breakages in total. Not to mention that changing dict order is likely to lead to much more subtle bugs than a straight MemoryError on a dict insert. Also, another advantage of collision counting I haven't seen in the discussion yet is that it's quite trivial to provide an API in sys to turn it on or off -- while turning on or off randomized hashes has to be done before Python starts up, i.e. at build time or with an environment variable or flag. Georg

On Fri, Jan 20, 2012 at 1:34 AM, "Martin v. Löwis" <martin@v.loewis.de>wrote:
It would be a pretty lousy app that tried to load the contents of an entire database into a dict. It seems that this would require much more knowledge of what the app is trying to do before a successful attack can be mounted. So I don't think this is worse than the original attack -- I think it requires much more ingenuity of an attacker. (I'm thinking that the original attack is trivial once the set of 65000 colliding keys is public knowledge, which must be only a matter of time.) -- --Guido van Rossum (python.org/~guido)

Hello, I still see at least two ways to create a DOS attack even with the collison-counting-patch. I assumed that it's possible to send ~500KB of payload to the application. 1. It's fully deterministic which slots the dict will lookup. Since we don't count slot-collisions, but only hash-value-collisions this can be exploited easily by creating strings with the hash-values along the lookup-way of an arbitrary (short) string. So first we pick an arbitrary string. Then calculate which slots it will visit on the way to the first empty slot. Then we create strings with hash-values for these slots. This attack first injects the strings to fill all the slots that the one short string will want to visit. Then it adds THE SAME string again and again. Since the entry is already there, nothing will be added and no additional collisions happen, no exception raised. $ ls -l super.txt -rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt $ tail -n3 super.txt FX5 FX5 FX5 $ wc -l super.txt 90000 super.txt $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("super.txt"))' real 0m52.724s user 0m51.543s sys 0m0.028s 2. The second attack actually attacks that 1000 allowed string comparisons are still a lot of work. First I added 999 strings that collide with a one-byte string "a". In some applications a zero-byte string might work even better. Then I can add a many thousand of the "a"'s, just like the first attack. $ ls -l 1000.txt -rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt $ head -n 3 1000.txt 7hLci00 4wVFm10 _rZJU50 $ wc -l 1000.txt 247000 1000.txt $ tail -n 3 1000.txt a a a $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("1000.txt"))' real 0m17.408s user 0m15.897s sys 0m0.008s Of course the first attack is far more efficient. One could argue that 16 seconds is not enough for an attack. But maybe it's possible to send 1MB, have zero-bytes strings, and since for example django does 5 lookups per query-string this will keep it busy for ~80 seconds on my pc. What to do now? I think it's not smart to reduce the number of allowed collisions dramatically AND count all slot-collisions at the same time. Frank

On 1/20/2012 10:55 AM, Frank Sievertsen wrote:
If 1000 were replaced by, for instance, random.randint(700,1000) the dict could not be set to have an exception triggered with one other entry (which I believe was Martin's idea). But I suppose you would say that 699 entries would still make for much work. The obvious defense for this particular attack is to reject duplicate keys. Perhaps there should be write-once string sets and dicts available. This gets to the point that there is no best blind defense to all possible attacks. -- Terry Jan Reedy

On Fri, 2012-01-20 at 16:55 +0100, Frank Sievertsen wrote:
[snip description of two types of attack on the collision counting approach]
Frank: did you see the new approach I proposed in: http://bugs.python.org/issue13703#msg151735 http://bugs.python.org/file24289/amortized-probe-counting-dmalcolm-2012-01-2... (repurposes the ma_smalltable region of large dictionaries to add tracking of each such dict's average iterations taken per modification, and raise an exception when it exceeds a particular ratio) I'm interested in hearing how it holds up against your various test cases, or what flaws there are in it. Thanks! Dave

Am 20.01.2012 16:33, schrieb Guido van Rossum:
I think it's very likely that this will happen soon. For ASP and PHP there is attack-payload publicly available. PHP and ASP have patches to limit the number of query-variables. We're very lucky that there's no public payload for python yet, and all non-public software and payload I'm aware of is based upon my software. But this can change any moment. It's not really difficult to write software to create 32bit-collisions. Frank

On Fri, Jan 20, 2012 at 2:35 PM, Frank Sievertsen <pydev@sievertsen.de> wrote:
While we're debating the best fix, could we allow people to at least protect themselves against script-kiddies by offering fixes to cgi.py, django, webob and a few other popular frameworks that limits forms to 1000 keys? (I suppose it's really only POST requests that are vulnerable to script kiddies, because of the length restriction on URLs.) -- --Guido van Rossum (python.org/~guido)

Victor Stinner <victor.stinner@haypocalc.com> wrote:
I propose to solve the hash collision vulnerability by counting collisions [...]
For web frameworks, forcing an exception is less harmful than forcing a many-second delay, but I think it's hard to be confident that there aren't other vulnerable applications where it's the other way round. Web frameworks like the exception because they already have backstop exception handlers, and anyway they use short-lived processes and keep valuable data in databases rather than process memory. Web frameworks don't like the delay because they allow unauthenticated users to submit many requests (including multiple requests in parallel), and they normally expect each response to take little cpu time. But many programs are not like this. What about a log analyser or a mailing list archiver or a web crawler or a game server or some other kind of program we haven't considered? -M-

On Sat, Jan 21, 2012 at 7:50 AM, Matthew Woodcraft <matthew@woodcraft.me.uk> wrote:
If my log crawler ended up taking minutes per log entry instead of milliseconds I'd have to kill it anyway. Web crawlers are huge multi-process systems that are as robust as web servers, or more. Game servers are just web apps. -- --Guido van Rossum (python.org/~guido)
participants (31)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Barry Warsaw
-
Ben Wolfson
-
Benjamin Peterson
-
Brett Cannon
-
Carl Meyer
-
Case Van Horsen
-
David Malcolm
-
Donald Stufft
-
Ethan Furman
-
Frank Sievertsen
-
Frank Sievertsen
-
Georg Brandl
-
Glenn Linderman
-
Guido van Rossum
-
Ivan Kozik
-
Janzert
-
Jared Grubb
-
Jim J. Jewett
-
Lennart Regebro
-
M.-A. Lemburg
-
Matthew Woodcraft
-
Nick Coghlan
-
Paul McMillan
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Tim Delaney
-
Tres Seaver
-
Victor Stinner