[Python-ideas] Add adaptive-load salt-mandatory hashing functions?

Sun Jun 10 15:05:53 CEST 2012

The standard library already provides for cryptographic hashes (hashlib)
and MACs (hmac).

One issue which exists, and has been repeatedly outlined after several
breaches of straight-hashed databases (salted and unsalted) last week,
is that many developers do not know:

1. straight hashes are not sufficient to store passwords securely in
   case of database breach
2. salted password, while mitigating rainbow table attacks, aren't
   enough to mitigate brute-force attacks.

(in case of database breach, the goal being to protect password
plaintexts from being found and matched to a user identity in case users
re-use passwords across services, as it would allow attackers to access
all services used by the user).

The best solution to these currently is *mandatory* salting (of
specified minimum strength) and adaptive workload which can be tuned
higher to keep up with Moore's law (especially as most hashing functions
tend to be very fast and embarassingly parallelizable, two undesirable
properties in the face of brute-forcing of the plaintext).

Therefore, I would suggest either adding a new module (name tbd) or
adding new constructors to hashlib.

* All password-hashing functions listed below should recommend a strong
  salt (the PBKDF2 specification recommends 64 bits, we could go further)
  by erroring out (ValueError) if the conditions are not met unless a
  `weak_salt=True` parameter is provided. I think this would be sufficient
  to hint at the importance of salt to users, and to drive them to "the
  right thing".

  The salt should also be mandated non-empty, providing an empty salt
  should generate an error in all cases.

* All password-hashing functions should require a `workload` parameter
  with documentary recommendation. A default value might make sense in
  the short run (ensure the functions are used with an acceptably high
  workload), but those defaults would be set in stone for users *not*
  setting their own load factor.

This module (or addition) should provide, if possible:

* PBKDF2, recommending a load factor of above 10000. The recommended
  load factor in RFC 2898 (PKCS #5) is 1000, but the specification
  is 12 years old. Extrapolating on that original load factor using
  Moore's law (the load factor has a linear relation to the amount 
  of computation in PBKDF2 as it's the number of hashing iterations),
  the stdlib could recommend a load factor of 64000 (6 doublings).

  As with hmac, it should be possible to configure the digest
  constructor (PKCS #5 specifies HMAC-SHA1 as the default PRF)

* bcrypt, the bcrypt C library is BSD-licensed and open-source so it
  could be added pretty directly, there is already a wrapper called
  "py-bcrypt" (under ISC/BSD licence)[0] 

* scrypt is younger and has been looked at less than the previous
  two[0], but from my readings (of articles on it, I am no cryptographer)
  it seems to have no overt issue and combines load-adaptive CPU-hardness
  with load-adaptive memory-hardness (PBKDF2 and bcrypt both work
  in constant space) making it significantly more resistant to
  massively parallel brute-forcing arrays (GPGPU or custom ASIC).

  It is available under a 2-clause BSD license as are the existing Python
  bindings I could find[2], but has a hard dependency on OpenSSL which may
  prevent its usage.

I think these would make Python users safe by lowering the
cost of using these functions and by demonstrating ways to safely
store passwords up-front. They could be augmented with a note in
hashlib indicating that they are to be preferred for password hashing.

[0] especially PBKDF2, still the most conservatively safe choice
[1] http://code.google.com/p/py-bcrypt/
[2] http://pypi.python.org/pypi/scrypt/