[Python-Dev] Hash collision security issue (now public)
jimjjewett at gmail.com
Sun Jan 8 23:33:32 CET 2012
Stefan Behnel wrote:
> Admittedly, this may require some adaptation for the PEP393 unicode memory
> layout in order to produce identical hashes for all three representations
> if they represent the same content.
They SHOULD NOT represent the same content; comparing two strings
currently requires converting them to canonical form, which means the
smallest format (of those three) that works.
If it can be represented in PyUnicode_1BYTE_KIND, then representations
using PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND don't count as
canonical, won't be created by Python itself, and already compare
unequal according to both PyUnicode_RichCompare and stringlib/eq.h (a
shortcut used by dicts).
That said, I don't think smallest-format is actually enforced with
anything stronger than comments (such as in unicodeobject.h struct
PyASCIIObject) and asserts (mostly calling
_PyUnicode_CheckConsistency). I don't have any insight on how
prevalent non-conforming strings will be in practice, or whether
supporting their equality will be required as a bugfix.
More information about the Python-Dev