[Python-ideas] Add adaptive-load salt-mandatory hashing functions?

Nick Coghlan ncoghlan at gmail.com
Mon Jun 11 22:54:43 CEST 2012


On Tue, Jun 12, 2012 at 6:21 AM, Guido van Rossum <guido at python.org> wrote:
> On Mon, Jun 11, 2012 at 1:08 PM, Masklinn <masklinn at masklinn.net> wrote:
>> The issue with this idea is that people are *not* driven to
>> state-of-the-art alternatives because they don't understand or know the
>> issue. And as a result, as we've seen last week, they'll use
>> cryptographic hashes (with or without salts) even though those are
>> insufficient, because that's available and they read on the internet
>> that it was what they needed.
>
> Is there any indication that Python was involved in last week's
> incidents? (I'm only aware of the Linkedin one -- were there others?)

eHarmony and last.fm were the other two prominent sites I saw mentioned.

We're not aware of any specific Python connection, it just prompted
the current discussion of whether or not there was anything CPython
could do to nudge developers in the right direction.

Even a native PBKDF2 would be an awful lot better than nothing.

>> And how are you going to make people understand there's a difference
>> between a cryptographic hash and a password hash by doing nothing,
>> giving them cryptographic hashes and leaving them to their own devices?
>
> Do you really think that including some API in the stdlib is going to
> make a difference in education? And what would we do if in 2 years
> time the stdlib's "basic functionality" were somehow compromised (not
> due to a bug in Python's implementation but simply through some
> advance in the crypto world) -- how would we get everyone who relied
> on the stdlib to switch to a different algorithm? I really think that
> the right approach here is to get *everyone* who needs this to use a
> 3rd party library. Diversity is very good here!

I think it's similar to the situation with hmac: for backwards
compatibility reasons, the default hash in hmac is still MD5. That
doesn't mean hmac is useless, and using MD5 is still better than doing
nothing. It's all about raising the bar for attackers, and the fact
that attackers are continually inventing better ladders and grappling
hooks doesn't mean the older walls become completely useless.

However, I also think, with the right API design, we could allow for
the key derivation algorithms to be retuned in security releases,
*because* the state of the art of evolves (and because computers get
faster). The passlib core APIs and hash formats are designed with
precisely that problem in mind.

>> [0] and beyond the bleeding edge lies ubiquitous 2-factor auth,
>>    probably.
>> [1] MD5crypt can not use adaptive load factors and injects constant
>>    data at some points, it also allows longer salts.
>> [2] http://packages.python.org/passlib/new_app_quickstart.html#recommended-hashes
>
> TBH it's possible that I'm not sufficiently familiar with the issue to
> have a valid opinion here -- I would never dream of taking on the
> responsibility of password security for anything, since I don't have
> the right crypto hacker mindset. But I do worry about having
> attractive suboptimal solutions to common security problems in the
> stdlib.

The trick is that even a suboptimal solution is a whole lot better
than the next-to-nothing that many people do currently. At the moment,
the available approaches are:

1. store plaintext passwords (eek)
2. store hashed unsalted passwords (vulnerable to rainbow tables)
3. store hashed salted passwords (vulnerable to massively parallel
brute force attacks)
4. store tunable cost hashed salted passwords (reduces vulnerability
to brute force, currently requires a third party library)

Option 4 *is* the state of the art, it's just a matter of tinkering
with the key derivation algorithm in response to advances in crypto
improvements, as well as ramping up the tuning parameters over time to
account for Moore's law.

By making it as easy as possible for people to use Option 4 instead of
one of the first 3, we increase the odds of people doing the right
thing. A third party library like passlib can then focus on more
dynamic things like:

1. Providing API compatible interfaces to 3rd party key derivation
algorithms (e.g. bcrypt, or the accelerated PBKDF2 implementation in
M2crypto), as well as to newer ones like scrypt
2. Providing convenient interfaces for reading and writing 3rd party
hash storage formats (e.g. LDAP)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list