[Python-Dev] Hash collision security issue (now public)

Mon Jan 2 01:21:11 CET 2012

Victor Stinner wrote in
<http://mail.python.org/pipermail/python-dev/2012-January/115198.html>

> If we want to protect a website against this attack for example, we must
> suppose that the attacker can inject arbitrary data and can get
> (indirectly) the result of hash(str) (e.g. with the representation of a
> dict in a traceback, with a JSON output, etc.).

(1)  Is it common to hash non-string input?  Because generating integers
that collide for certain dict sizes is pretty easy...

(2)  Would it make sense for traceback printing to sort dict keys?  (Any site
worried about this issue should already be hiding tracebacks from untrusted
clients, but the cost of this extra protection may be pretty small, given that
tracebacks shouldn't be printed all that often in the first place.)

(3)  Should the docs for json.encoder.JSONEncoder suggest sort_keys=True?

-jJ