[Python-Dev] PEP 456

Serhiy Storchaka storchaka at gmail.com
Thu Oct 3 21:53:43 CEST 2013


Just some comments.

 > the first time time with a bit shift of 7

Double "time".

 > with a 128bit seed and 64-bit output

Inconsistancy with hyphen. There are same issues in other places.

 > bytes_hash provides the tp_hash slot function for unicode.

Typo. Should be "unicode_hash".

 > len = PyUnicode_GET_LENGTH(self);
 > switch (PyUnicode_KIND(self)) {
 > case PyUnicode_1BYTE_KIND: {
 >     const Py_UCS1 *c = PyUnicode_1BYTE_DATA(self);
 >     x = _PyHash_Func->hashfunc(c, len * sizeof(Py_UCS1));
 >     break;
 > }
 > case PyUnicode_2BYTE_KIND: {
...

x = _PyHash_Func->hashfunc(PyUnicode_BYTE_DATA(self), 
PyUnicode_GET_LENGTH(self) * PyUnicode_KIND(self));

 > Equal hash values result in a hash collision and therefore cause a 
minor speed penalty for dicts and sets with mixed keys. The cause of the 
collision could be removed

I doubt about this. If one collects bytes and strings in one dictionary, 
this equality will only double the number of collisions (for DoS attack 
we need increase it by thousands and millions times). So it doesn't 
matter. On the other hand, I one deliberately uses bytes and str 
subclasses with overridden equality, same hash for ASCII bytes and 
strings can be needed.

 > For very short strings the setup costs for SipHash dominates its 
speed but it is still in the same order of magnitude as the current FNV 
code.

We could use other algorithm for very short strings if it makes matter.

 > The summarized total runtime of the benchmark is within 1% of the 
runtime of an unmodified Python 3.4 binary.

What about deviations of individual tests?




More information about the Python-Dev mailing list