Add adaptive-load salt-mandatory hashing functions?

The standard library already provides for cryptographic hashes (hashlib) and MACs (hmac). One issue which exists, and has been repeatedly outlined after several breaches of straight-hashed databases (salted and unsalted) last week, is that many developers do not know: 1. straight hashes are not sufficient to store passwords securely in case of database breach 2. salted password, while mitigating rainbow table attacks, aren't enough to mitigate brute-force attacks. (in case of database breach, the goal being to protect password plaintexts from being found and matched to a user identity in case users re-use passwords across services, as it would allow attackers to access all services used by the user). The best solution to these currently is *mandatory* salting (of specified minimum strength) and adaptive workload which can be tuned higher to keep up with Moore's law (especially as most hashing functions tend to be very fast and embarassingly parallelizable, two undesirable properties in the face of brute-forcing of the plaintext). Therefore, I would suggest either adding a new module (name tbd) or adding new constructors to hashlib. * All password-hashing functions listed below should recommend a strong salt (the PBKDF2 specification recommends 64 bits, we could go further) by erroring out (ValueError) if the conditions are not met unless a `weak_salt=True` parameter is provided. I think this would be sufficient to hint at the importance of salt to users, and to drive them to "the right thing". The salt should also be mandated non-empty, providing an empty salt should generate an error in all cases. * All password-hashing functions should require a `workload` parameter with documentary recommendation. A default value might make sense in the short run (ensure the functions are used with an acceptably high workload), but those defaults would be set in stone for users *not* setting their own load factor. This module (or addition) should provide, if possible: * PBKDF2, recommending a load factor of above 10000. The recommended load factor in RFC 2898 (PKCS #5) is 1000, but the specification is 12 years old. Extrapolating on that original load factor using Moore's law (the load factor has a linear relation to the amount of computation in PBKDF2 as it's the number of hashing iterations), the stdlib could recommend a load factor of 64000 (6 doublings). As with hmac, it should be possible to configure the digest constructor (PKCS #5 specifies HMAC-SHA1 as the default PRF) * bcrypt, the bcrypt C library is BSD-licensed and open-source so it could be added pretty directly, there is already a wrapper called "py-bcrypt" (under ISC/BSD licence)[0] * scrypt is younger and has been looked at less than the previous two[0], but from my readings (of articles on it, I am no cryptographer) it seems to have no overt issue and combines load-adaptive CPU-hardness with load-adaptive memory-hardness (PBKDF2 and bcrypt both work in constant space) making it significantly more resistant to massively parallel brute-forcing arrays (GPGPU or custom ASIC). It is available under a 2-clause BSD license as are the existing Python bindings I could find[2], but has a hard dependency on OpenSSL which may prevent its usage. I think these would make Python users safe by lowering the cost of using these functions and by demonstrating ways to safely store passwords up-front. They could be augmented with a note in hashlib indicating that they are to be preferred for password hashing. [0] especially PBKDF2, still the most conservatively safe choice [1] http://code.google.com/p/py-bcrypt/ [2] http://pypi.python.org/pypi/scrypt/

Le 10/06/2012 15:05, Masklinn a écrit :
PBKDF2 can be implemented in 15 lines of code based on the hmac and hashlib modules: https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py Although the code is short, it is easy to get wrong. So I think it would be nice to have in the stdlib, tested once and for all. Also, PBKDF2 is a well-defined spec that will not change (or it will be called PBKDF3 or something) which I think makes it a good fit for the stdlib. I would suggest to have Armin’s implementation (linked above) included as-is, but it’s probably too late for 3.3. -- Simon Sapin

On Mon, Jun 11, 2012 at 12:17 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
It's cutting it very fine relative to the beta feature freeze (which is in a couple of weeks), but it could still make it in as a very reasonable addition to the standard library. The hmac module has already been enhanced with a "secure_compare" function for 3.3 to perform string and byte sequence comparisons that don't leak as much information about the expected result under timing attacks (it still leaks the expected length, but beyond that the running time of the comparison should be constant for a given digest length). Since the PBKDF2 key derivation requires hmac, and hmac depends on hashlib (to provide the default hash algorithm for hmac.HMAC), I believe the best way to expedite this would be to: 1. Create an issue on bugs.python.org proposing just the binary version of pbkdf2 as an enhancement to hmac 2. Attach a patch that updates Lib/hmac.py, Lib/test/test_hmac.py and Doc/library/hmac.rst accordingly (this will likely require changes to work with bytes rather than 2.x strings) 3. Adds a "min_salt_len" parameter to discourage short salt values (rather than the "weak_salt" boolean flag suggested by Masklinn) 4. Post to python-dev proposing the addition of that function for Python 3 Having needed a key derivation function myself not that long ago, and with the recent high profile password database breaches Masklinn noted, this seems like a very reasonable addition to me. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2012-06-10, at 17:28 , Nick Coghlan wrote:
Although it makes sense from a dependency POV, I'm not sure it's the best place to put it as people in need of knowing about PBKDF2 would be more likely to be browsing hashlib, and — more importantly — PBKDF2 isn't a MAC, the usage of hmac underlying it being mostly incidental. If PBKDF2 alone is added, I think putting it in its own module (parallel to hmac) would be cleaner, *that* can be deprecated if more cryptographic hashes of that style (e.g. bcrypt, scrypt) are added later on in the style of md5 -> hashlib.

On Mon, Jun 11, 2012 at 1:52 AM, Masklinn <masklinn@masklinn.net> wrote:
Yeah, you're probably right. Either a new module, or else in "getpass" (either way, with a cross-reference from hashlib). Wherever it ends up, it should also reference hmac.secure_compare for a comparison function that doesn't allowing timing attacks to progressively discover the expected hash. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jun 10, 2012 at 9:11 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'd just stick it in hmac myself but getpass was also a good suggestion. Cross reference to it from the docs of all three as the real goal of adding pbkdf2 is to advertise it to users so that they might use it rather than something more naive. hashlib itself should be kept pure as is for standard low level hash algorithms. It can't have a dependency on anything else. Even if this doesn't make it into the stdlib in time for 3.3, feel free to update the getpass, hmac and/or hashlib docs to point to the pbkdf2 module externally as a suggestion for passphrase/secret hashing. -gps

On 2012-06-10, at 20:04 , Simon Sapin wrote:
It seems there's as many opinions on the subject as there are people (which was to be expected) when there's no code yet, I'll try to get something done first (unless somebody else wants to) and discussion of its exact location in the stdlib can be bikeshed in -dev if and when that point/paint is reached.

Le 10/06/2012 20:11, Masklinn a écrit :
[...] when there's no code yet I'll try to get something done first
There is code, with tests. Here is the link I posted earlier in this thread: https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py -- Simon Sapin

On 2012-06-10, at 20:24 , Simon Sapin wrote:
Yes, I've seen it, but 1. I'll need to talk to Armin about using that code (which is why I CC'd him to the list when I responded to Nick's response to your comment), or have him do it, I don't think anybody is going to take his code without even asking for consent and try to push it into the stdlib 2. The interface is simple, but painful. Just look at the comment at the top: 3. Store ``algorithm$salt:costfactor$hash`` in the database so that you can upgrade later easily to a different algorithm if you need one. For instance ``PBKDF2-256$thesalt:10000$deadbeef...``. if we know what's supposed to be done, how about just doing it and returning *that*? If it goes into the stdlib, I'd like to have something non-cryptographers can use easily, correctly and without making mistakes. Then there's the issue of implementing the equality test, extracting stuff from that storage string on subsequent auths to test for matches. It should be possible to do all that in a single user-facing operations, no munging about in user's code. 3. The test suite needs to be converted to the stdlib's format 4. The documentation needs to be written

On Mon, Jun 11, 2012 at 4:35 AM, Masklinn <masklinn@masklinn.net> wrote:
Right. Given the time frames involved, it's probably best to target this at 3.4 as a simple way to do rainbow-table-and-brute-force-resistant password hashing and comparisons, defaulting to PBKDF2, but accepting alternative key derivation functions so people can plug in bcrypt, scrypt, etc (similar to the way hmac defaults to md5, but lets you specify any hash function with the appropriate API). I think Armin's already created a good foundation for that, but there'll be quite a bit of work in getting a PEP written, etc. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 11.06.2012 08:09, schrieb Nick Coghlan:
Python already has an excellent library for password hashing: passlib [1]. It's well written and documented, contains more than 30 password hashing algorithms and schemas used by major platforms and applications like Unix, LDAP and databases. The library even contains a policy framework for handling, recognizing and migrating passwords as well as counteractive measures against side channel attacks. IMHO it's not enough to just provide the basic algorithm for PBKDF2 and friends. There is still too much space for error. Passlib hides the complex parts and has a user friendly API, for example http://packages.python.org/passlib/lib/passlib.context-tutorial.html#depreca... . Christian [1] http://packages.python.org/passlib/

On Mon, Jun 11, 2012 at 6:42 PM, Christian Heimes <lists@cheimes.de> wrote:
Thanks for the link Christian, it does appear this particular wheel has already been thoroughly invented. I'll be recommending passlib for use by others in the future and look into adopting it for my own projects. However, password hashing is an important and common enough problem that it would be good to have some basic level of support in the standard library, with a clear migration path to a more feature complete approach like passlib. It would be good if someone was willing to do the work of raising this discussion with the passlib authors, and looking to see if a suitably stable core could be extracted that is API compatible with passlib, and could be proposed as a standard library addition for 3.4. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 11.06.2012 12:03, schrieb Nick Coghlan:
You are welcome! I'm using passlib for about two years and really like its API. PyPI surprises now and then with its hidden gems. I wished we had a way to draw more attention to good solutions, something like "official endorsed projects" or so.
That's a nice idea, Nick! I've added one of the two core developers of passlib to the CC list. The other one doesn't have his/her email address exposed on Google Code. A stripped down and API compatible version of passlib would make a good addition for Python's standard library. IMHO the complete passlib package is too big for the core. The context API and handlers for bcrypt, pbkdf2 and sha*_crypt are sufficient. Developers can still install passlib if they need all features. We need to come up with a different name (passhash ?) for the stdlib variant. Christian

On Mon, Jun 11, 2012 at 3:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I usually like this approach, but here I am hesitant, because of the cost if the basic approach is found inadequate. The stdlib support should either be state-of-the art or so poor that people are naturally driven to a state-of-the art alternative on PyPI that is maintained regularly. In this case I think our only option is the latter. I do think it is another example of a situation where the stdlib docs ought to contain some hints about where to go instead for this functionality. -- --Guido van Rossum (python.org/~guido)

On 2012-06-11, at 18:49 , Guido van Rossum wrote:
Well depends what you mean by "state of the art", PBKDF2 is still the "tried and true" trusted password-hashing algorithm (it's the one used in TrueCrypt, 1Password, WPA2, DPAPI and many others). bcrypt is the "old newness", working on the same principle as PBKDF2 (do lots of work) but a different underlying algorithm, and scrypt is the "new newness" as it includes being memory-hard on top of being processing-hard, but is significantly less trusted as it's only a few years old. So as far as I know, PBKDF2 is indeed "state of the art", scrypt is "bleeding edge" and bcrypt is somewhere in-between[0] (but if PBKDF2 is found to be insufficient, bcrypt will fall for similar reasons: it's only binding on CPU power and is easy to parallelize). Ulrich Drepper also built an MD5crypt-inspired crypt based on SHA2 (and fixed a few weak ideas of MD5crypt[1]) a few years ago. As a matter of facts, passlib notes PBKDF2/SHA512 as one of its three recommendation (alongside bcrypt and sha512_crypt) and notes it is the most portable of three roughly equivalent choices[2] (and that sha512_crypt is somewhat baroque and harder to analyze for flaws than the alternatives).
The issue with this idea is that people are *not* driven to state-of-the-art alternatives because they don't understand or know the issue. And as a result, as we've seen last week, they'll use cryptographic hashes (with or without salts) even though those are insufficient, because that's available and they read on the internet that it was what they needed. And how are you going to make people understand there's a difference between a cryptographic hash and a password hash by doing nothing, giving them cryptographic hashes and leaving them to their own devices? [0] and beyond the bleeding edge lies ubiquitous 2-factor auth, probably. [1] MD5crypt can not use adaptive load factors and injects constant data at some points, it also allows longer salts. [2] http://packages.python.org/passlib/new_app_quickstart.html#recommended-hashe...

On Mon, Jun 11, 2012 at 1:08 PM, Masklinn <masklinn@masklinn.net> wrote:
Is there any indication that Python was involved in last week's incidents? (I'm only aware of the Linkedin one -- were there others?)
Do you really think that including some API in the stdlib is going to make a difference in education? And what would we do if in 2 years time the stdlib's "basic functionality" were somehow compromised (not due to a bug in Python's implementation but simply through some advance in the crypto world) -- how would we get everyone who relied on the stdlib to switch to a different algorithm? I really think that the right approach here is to get *everyone* who needs this to use a 3rd party library. Diversity is very good here!
TBH it's possible that I'm not sufficiently familiar with the issue to have a valid opinion here -- I would never dream of taking on the responsibility of password security for anything, since I don't have the right crypto hacker mindset. But I do worry about having attractive suboptimal solutions to common security problems in the stdlib. -- --Guido van Rossum (python.org/~guido)

Am 11.06.2012 22:21, schrieb Guido van Rossum:
Is there any indication that Python was involved in last week's incidents? (I'm only aware of the Linkedin one -- were there others?)
No, zero Pythons were harmed. The other victims were last.fm and eHarmony. Surprisingly, Sony wasn't hacked last week! *scnr*
+1 I'm against adding just the password hashing algorithms. Developers can easily screw up right algorithm with a erroneous approach. It's the beauty of passlib: The framework hides all the complex and easy-to-get-wrong stuff behind a minimal API. Christian

On Tue, Jun 12, 2012 at 6:39 AM, Christian Heimes <lists@cheimes.de> wrote:
Right, when I suggested looking for an "API compatible stable core" that could be added for 3.4, I was specifically thinking of: 1. The core CryptContext API 2. The PBKDF2 and sha512_crypt derivation functions Based on a brief look a the module documentation, those parts seem like they're sufficiently mature to be suitable for the stdlib, whereas the rest of passlib is more suited to development as a 3rd party library with its own release schedule. However, I could be completely wrong, thus the suggestion that it be looked into, rather than "we should definitely do this". At the very least, we should be directing people towards passlib for password storage and comparison purposes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Christian Heimes <lists@...> writes:
I know I'm a little late to this thread, but as the primary Passlib author, I wanted to throw in my two cents. I wholeheartedly agree with the idea of not having a high-level password hashing library in stdlib. I'd be honored and happy to help in extracting a subset of passlib for inclusion in the standard library. However, for all the reasons GvR pointed out, I'm scared at the thought of how slowly end deployments would get needed security updates (for one thing, I update the adaptive cost of the hashes in passlib about once a year just as a matter of course). I'm reminded of how the Debian project has had to create a "security" repository to supplement the "stable" repository, just so the slow-moving "stable" release gets timely security updates. All that said, I wouldn't mind seeing a pbkdf2() primitive added to stdlib, along the lines of M2Crypto's pbkdf2 function [1]. I agree such a function might mislead developers to roll their own password hashing routines, but a word of warning and redirection in the documentation might help with that. The reason I see a need for such a function is that all existing password hashing libraries (passlib, cryptacular, flufl.password, django.contrib.auth.hashers, etc) have had to roll their own pure-python pbkdf2 implementations, to varying degrees of speed. And speed is paramount for pbkdf2 usage, since security depends on squeezing as many rounds / second out of the implementation as possible. Having a single C-accelerated primitive would be great for all of the above libraries, and all the other uses pbkdf2 has. Furthermore, it wouldn't need frequent security updates, since the hash storage format, default cost, default digest, etc, would all be handled by the higher-level libraries. Not that I'm advocating such a thing is *needed*, but that's what I'd love to see, were anything to be added in this direction. Hope all that helps in your decision making. Thanks, Eli [1] http://www.heikkitoivonen.net/m2crypto/api/M2Crypto.EVP-module.html#pbkdf2 - Eli Collins

On Jun 15, 2012, at 07:07 PM, Eli Collins wrote:
To be honest, if I'd known about passlib I probably would never have written flufl.password. Extra +1 goodness for passlib's Python 3 support! I'm going to migrate my own applications to passlib and if that goes well, I'll start the process of deprecating flufl.password. Cheers, -Barry

On Tue, Jun 12, 2012 at 6:21 AM, Guido van Rossum <guido@python.org> wrote:
eHarmony and last.fm were the other two prominent sites I saw mentioned. We're not aware of any specific Python connection, it just prompted the current discussion of whether or not there was anything CPython could do to nudge developers in the right direction. Even a native PBKDF2 would be an awful lot better than nothing.
I think it's similar to the situation with hmac: for backwards compatibility reasons, the default hash in hmac is still MD5. That doesn't mean hmac is useless, and using MD5 is still better than doing nothing. It's all about raising the bar for attackers, and the fact that attackers are continually inventing better ladders and grappling hooks doesn't mean the older walls become completely useless. However, I also think, with the right API design, we could allow for the key derivation algorithms to be retuned in security releases, *because* the state of the art of evolves (and because computers get faster). The passlib core APIs and hash formats are designed with precisely that problem in mind.
The trick is that even a suboptimal solution is a whole lot better than the next-to-nothing that many people do currently. At the moment, the available approaches are: 1. store plaintext passwords (eek) 2. store hashed unsalted passwords (vulnerable to rainbow tables) 3. store hashed salted passwords (vulnerable to massively parallel brute force attacks) 4. store tunable cost hashed salted passwords (reduces vulnerability to brute force, currently requires a third party library) Option 4 *is* the state of the art, it's just a matter of tinkering with the key derivation algorithm in response to advances in crypto improvements, as well as ramping up the tuning parameters over time to account for Moore's law. By making it as easy as possible for people to use Option 4 instead of one of the first 3, we increase the odds of people doing the right thing. A third party library like passlib can then focus on more dynamic things like: 1. Providing API compatible interfaces to 3rd party key derivation algorithms (e.g. bcrypt, or the accelerated PBKDF2 implementation in M2crypto), as well as to newer ones like scrypt 2. Providing convenient interfaces for reading and writing 3rd party hash storage formats (e.g. LDAP) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jun 10, 2012 at 2:36 PM, Paul Moore <p.f.moore@gmail.com> wrote:
To use that would need Armin's approval and support. So far he's not commented here.
Only if you want a different license than 3-clause BSD. P.S. I love this thread. Great suggestion. :) -- Devin

On Tue, Jun 12, 2012 at 2:34 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I wrote that quickly. I don't want a circular dependency or things that aren't well established standards in hashlib. I see hashlib as being for low level algorithms only (FIPS standards, etc) where fast implementations are available in most VM runtimes. hmac depends on hashlib therefore nothing in hashlib should ever depend on hmac. That doesn't prevent someone from deciding hmac shouldn't be a module of its own and moving it to live within hashlib some day but that would seem like needless API churn outside of a major language version change. -gps

I'd love to have a PBKDF2 implementation in the stdlib. My flufl.password module has an implementation donated by security expert Bob Fleck. Any insecure implementation bugs are solely blamed on me though. ;) http://bazaar.launchpad.net/~barry/flufl.password/trunk/view/head:/flufl/pas... The API is a little odd because it fits into the larger API for flufl.password, but if it's useful, I'd happily cleanup and donate the code for the stdlib. OTOH, I'd be just as happy (maybe more) to get rid of it in favor of a stdlib implementation. Cheers, -Barry

Am 13.06.2012 22:38, schrieb Barry Warsaw:
At first glance your implementation is vulnerable to side channel attacks because you aren't using a constant time equality function. Also you are using the least secure variant of PBKDF2 (SHA-1 instead of SHA-256 or SHA-512). At least you are using os.urandom() as source for the salt, which is usually fine. Passlib supports the LDAP variants, too. [1] Outside of LDAP the established notation is $pbkdf2-digest$rounds$salt$checksum. Christian [1] http://packages.python.org/passlib/lib/passlib.hash.ldap_pbkdf2_digest.html

Le 10/06/2012 15:05, Masklinn a écrit :
PBKDF2 can be implemented in 15 lines of code based on the hmac and hashlib modules: https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py Although the code is short, it is easy to get wrong. So I think it would be nice to have in the stdlib, tested once and for all. Also, PBKDF2 is a well-defined spec that will not change (or it will be called PBKDF3 or something) which I think makes it a good fit for the stdlib. I would suggest to have Armin’s implementation (linked above) included as-is, but it’s probably too late for 3.3. -- Simon Sapin

On Mon, Jun 11, 2012 at 12:17 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
It's cutting it very fine relative to the beta feature freeze (which is in a couple of weeks), but it could still make it in as a very reasonable addition to the standard library. The hmac module has already been enhanced with a "secure_compare" function for 3.3 to perform string and byte sequence comparisons that don't leak as much information about the expected result under timing attacks (it still leaks the expected length, but beyond that the running time of the comparison should be constant for a given digest length). Since the PBKDF2 key derivation requires hmac, and hmac depends on hashlib (to provide the default hash algorithm for hmac.HMAC), I believe the best way to expedite this would be to: 1. Create an issue on bugs.python.org proposing just the binary version of pbkdf2 as an enhancement to hmac 2. Attach a patch that updates Lib/hmac.py, Lib/test/test_hmac.py and Doc/library/hmac.rst accordingly (this will likely require changes to work with bytes rather than 2.x strings) 3. Adds a "min_salt_len" parameter to discourage short salt values (rather than the "weak_salt" boolean flag suggested by Masklinn) 4. Post to python-dev proposing the addition of that function for Python 3 Having needed a key derivation function myself not that long ago, and with the recent high profile password database breaches Masklinn noted, this seems like a very reasonable addition to me. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2012-06-10, at 17:28 , Nick Coghlan wrote:
Although it makes sense from a dependency POV, I'm not sure it's the best place to put it as people in need of knowing about PBKDF2 would be more likely to be browsing hashlib, and — more importantly — PBKDF2 isn't a MAC, the usage of hmac underlying it being mostly incidental. If PBKDF2 alone is added, I think putting it in its own module (parallel to hmac) would be cleaner, *that* can be deprecated if more cryptographic hashes of that style (e.g. bcrypt, scrypt) are added later on in the style of md5 -> hashlib.

On Mon, Jun 11, 2012 at 1:52 AM, Masklinn <masklinn@masklinn.net> wrote:
Yeah, you're probably right. Either a new module, or else in "getpass" (either way, with a cross-reference from hashlib). Wherever it ends up, it should also reference hmac.secure_compare for a comparison function that doesn't allowing timing attacks to progressively discover the expected hash. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jun 10, 2012 at 9:11 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'd just stick it in hmac myself but getpass was also a good suggestion. Cross reference to it from the docs of all three as the real goal of adding pbkdf2 is to advertise it to users so that they might use it rather than something more naive. hashlib itself should be kept pure as is for standard low level hash algorithms. It can't have a dependency on anything else. Even if this doesn't make it into the stdlib in time for 3.3, feel free to update the getpass, hmac and/or hashlib docs to point to the pbkdf2 module externally as a suggestion for passphrase/secret hashing. -gps

On 2012-06-10, at 20:04 , Simon Sapin wrote:
It seems there's as many opinions on the subject as there are people (which was to be expected) when there's no code yet, I'll try to get something done first (unless somebody else wants to) and discussion of its exact location in the stdlib can be bikeshed in -dev if and when that point/paint is reached.

Le 10/06/2012 20:11, Masklinn a écrit :
[...] when there's no code yet I'll try to get something done first
There is code, with tests. Here is the link I posted earlier in this thread: https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py -- Simon Sapin

On 2012-06-10, at 20:24 , Simon Sapin wrote:
Yes, I've seen it, but 1. I'll need to talk to Armin about using that code (which is why I CC'd him to the list when I responded to Nick's response to your comment), or have him do it, I don't think anybody is going to take his code without even asking for consent and try to push it into the stdlib 2. The interface is simple, but painful. Just look at the comment at the top: 3. Store ``algorithm$salt:costfactor$hash`` in the database so that you can upgrade later easily to a different algorithm if you need one. For instance ``PBKDF2-256$thesalt:10000$deadbeef...``. if we know what's supposed to be done, how about just doing it and returning *that*? If it goes into the stdlib, I'd like to have something non-cryptographers can use easily, correctly and without making mistakes. Then there's the issue of implementing the equality test, extracting stuff from that storage string on subsequent auths to test for matches. It should be possible to do all that in a single user-facing operations, no munging about in user's code. 3. The test suite needs to be converted to the stdlib's format 4. The documentation needs to be written

On Mon, Jun 11, 2012 at 4:35 AM, Masklinn <masklinn@masklinn.net> wrote:
Right. Given the time frames involved, it's probably best to target this at 3.4 as a simple way to do rainbow-table-and-brute-force-resistant password hashing and comparisons, defaulting to PBKDF2, but accepting alternative key derivation functions so people can plug in bcrypt, scrypt, etc (similar to the way hmac defaults to md5, but lets you specify any hash function with the appropriate API). I think Armin's already created a good foundation for that, but there'll be quite a bit of work in getting a PEP written, etc. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 11.06.2012 08:09, schrieb Nick Coghlan:
Python already has an excellent library for password hashing: passlib [1]. It's well written and documented, contains more than 30 password hashing algorithms and schemas used by major platforms and applications like Unix, LDAP and databases. The library even contains a policy framework for handling, recognizing and migrating passwords as well as counteractive measures against side channel attacks. IMHO it's not enough to just provide the basic algorithm for PBKDF2 and friends. There is still too much space for error. Passlib hides the complex parts and has a user friendly API, for example http://packages.python.org/passlib/lib/passlib.context-tutorial.html#depreca... . Christian [1] http://packages.python.org/passlib/

On Mon, Jun 11, 2012 at 6:42 PM, Christian Heimes <lists@cheimes.de> wrote:
Thanks for the link Christian, it does appear this particular wheel has already been thoroughly invented. I'll be recommending passlib for use by others in the future and look into adopting it for my own projects. However, password hashing is an important and common enough problem that it would be good to have some basic level of support in the standard library, with a clear migration path to a more feature complete approach like passlib. It would be good if someone was willing to do the work of raising this discussion with the passlib authors, and looking to see if a suitably stable core could be extracted that is API compatible with passlib, and could be proposed as a standard library addition for 3.4. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 11.06.2012 12:03, schrieb Nick Coghlan:
You are welcome! I'm using passlib for about two years and really like its API. PyPI surprises now and then with its hidden gems. I wished we had a way to draw more attention to good solutions, something like "official endorsed projects" or so.
That's a nice idea, Nick! I've added one of the two core developers of passlib to the CC list. The other one doesn't have his/her email address exposed on Google Code. A stripped down and API compatible version of passlib would make a good addition for Python's standard library. IMHO the complete passlib package is too big for the core. The context API and handlers for bcrypt, pbkdf2 and sha*_crypt are sufficient. Developers can still install passlib if they need all features. We need to come up with a different name (passhash ?) for the stdlib variant. Christian

On Mon, Jun 11, 2012 at 3:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I usually like this approach, but here I am hesitant, because of the cost if the basic approach is found inadequate. The stdlib support should either be state-of-the art or so poor that people are naturally driven to a state-of-the art alternative on PyPI that is maintained regularly. In this case I think our only option is the latter. I do think it is another example of a situation where the stdlib docs ought to contain some hints about where to go instead for this functionality. -- --Guido van Rossum (python.org/~guido)

On 2012-06-11, at 18:49 , Guido van Rossum wrote:
Well depends what you mean by "state of the art", PBKDF2 is still the "tried and true" trusted password-hashing algorithm (it's the one used in TrueCrypt, 1Password, WPA2, DPAPI and many others). bcrypt is the "old newness", working on the same principle as PBKDF2 (do lots of work) but a different underlying algorithm, and scrypt is the "new newness" as it includes being memory-hard on top of being processing-hard, but is significantly less trusted as it's only a few years old. So as far as I know, PBKDF2 is indeed "state of the art", scrypt is "bleeding edge" and bcrypt is somewhere in-between[0] (but if PBKDF2 is found to be insufficient, bcrypt will fall for similar reasons: it's only binding on CPU power and is easy to parallelize). Ulrich Drepper also built an MD5crypt-inspired crypt based on SHA2 (and fixed a few weak ideas of MD5crypt[1]) a few years ago. As a matter of facts, passlib notes PBKDF2/SHA512 as one of its three recommendation (alongside bcrypt and sha512_crypt) and notes it is the most portable of three roughly equivalent choices[2] (and that sha512_crypt is somewhat baroque and harder to analyze for flaws than the alternatives).
The issue with this idea is that people are *not* driven to state-of-the-art alternatives because they don't understand or know the issue. And as a result, as we've seen last week, they'll use cryptographic hashes (with or without salts) even though those are insufficient, because that's available and they read on the internet that it was what they needed. And how are you going to make people understand there's a difference between a cryptographic hash and a password hash by doing nothing, giving them cryptographic hashes and leaving them to their own devices? [0] and beyond the bleeding edge lies ubiquitous 2-factor auth, probably. [1] MD5crypt can not use adaptive load factors and injects constant data at some points, it also allows longer salts. [2] http://packages.python.org/passlib/new_app_quickstart.html#recommended-hashe...

On Mon, Jun 11, 2012 at 1:08 PM, Masklinn <masklinn@masklinn.net> wrote:
Is there any indication that Python was involved in last week's incidents? (I'm only aware of the Linkedin one -- were there others?)
Do you really think that including some API in the stdlib is going to make a difference in education? And what would we do if in 2 years time the stdlib's "basic functionality" were somehow compromised (not due to a bug in Python's implementation but simply through some advance in the crypto world) -- how would we get everyone who relied on the stdlib to switch to a different algorithm? I really think that the right approach here is to get *everyone* who needs this to use a 3rd party library. Diversity is very good here!
TBH it's possible that I'm not sufficiently familiar with the issue to have a valid opinion here -- I would never dream of taking on the responsibility of password security for anything, since I don't have the right crypto hacker mindset. But I do worry about having attractive suboptimal solutions to common security problems in the stdlib. -- --Guido van Rossum (python.org/~guido)

Am 11.06.2012 22:21, schrieb Guido van Rossum:
Is there any indication that Python was involved in last week's incidents? (I'm only aware of the Linkedin one -- were there others?)
No, zero Pythons were harmed. The other victims were last.fm and eHarmony. Surprisingly, Sony wasn't hacked last week! *scnr*
+1 I'm against adding just the password hashing algorithms. Developers can easily screw up right algorithm with a erroneous approach. It's the beauty of passlib: The framework hides all the complex and easy-to-get-wrong stuff behind a minimal API. Christian

On Tue, Jun 12, 2012 at 6:39 AM, Christian Heimes <lists@cheimes.de> wrote:
Right, when I suggested looking for an "API compatible stable core" that could be added for 3.4, I was specifically thinking of: 1. The core CryptContext API 2. The PBKDF2 and sha512_crypt derivation functions Based on a brief look a the module documentation, those parts seem like they're sufficiently mature to be suitable for the stdlib, whereas the rest of passlib is more suited to development as a 3rd party library with its own release schedule. However, I could be completely wrong, thus the suggestion that it be looked into, rather than "we should definitely do this". At the very least, we should be directing people towards passlib for password storage and comparison purposes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Christian Heimes <lists@...> writes:
I know I'm a little late to this thread, but as the primary Passlib author, I wanted to throw in my two cents. I wholeheartedly agree with the idea of not having a high-level password hashing library in stdlib. I'd be honored and happy to help in extracting a subset of passlib for inclusion in the standard library. However, for all the reasons GvR pointed out, I'm scared at the thought of how slowly end deployments would get needed security updates (for one thing, I update the adaptive cost of the hashes in passlib about once a year just as a matter of course). I'm reminded of how the Debian project has had to create a "security" repository to supplement the "stable" repository, just so the slow-moving "stable" release gets timely security updates. All that said, I wouldn't mind seeing a pbkdf2() primitive added to stdlib, along the lines of M2Crypto's pbkdf2 function [1]. I agree such a function might mislead developers to roll their own password hashing routines, but a word of warning and redirection in the documentation might help with that. The reason I see a need for such a function is that all existing password hashing libraries (passlib, cryptacular, flufl.password, django.contrib.auth.hashers, etc) have had to roll their own pure-python pbkdf2 implementations, to varying degrees of speed. And speed is paramount for pbkdf2 usage, since security depends on squeezing as many rounds / second out of the implementation as possible. Having a single C-accelerated primitive would be great for all of the above libraries, and all the other uses pbkdf2 has. Furthermore, it wouldn't need frequent security updates, since the hash storage format, default cost, default digest, etc, would all be handled by the higher-level libraries. Not that I'm advocating such a thing is *needed*, but that's what I'd love to see, were anything to be added in this direction. Hope all that helps in your decision making. Thanks, Eli [1] http://www.heikkitoivonen.net/m2crypto/api/M2Crypto.EVP-module.html#pbkdf2 - Eli Collins

On Jun 15, 2012, at 07:07 PM, Eli Collins wrote:
To be honest, if I'd known about passlib I probably would never have written flufl.password. Extra +1 goodness for passlib's Python 3 support! I'm going to migrate my own applications to passlib and if that goes well, I'll start the process of deprecating flufl.password. Cheers, -Barry

On Tue, Jun 12, 2012 at 6:21 AM, Guido van Rossum <guido@python.org> wrote:
eHarmony and last.fm were the other two prominent sites I saw mentioned. We're not aware of any specific Python connection, it just prompted the current discussion of whether or not there was anything CPython could do to nudge developers in the right direction. Even a native PBKDF2 would be an awful lot better than nothing.
I think it's similar to the situation with hmac: for backwards compatibility reasons, the default hash in hmac is still MD5. That doesn't mean hmac is useless, and using MD5 is still better than doing nothing. It's all about raising the bar for attackers, and the fact that attackers are continually inventing better ladders and grappling hooks doesn't mean the older walls become completely useless. However, I also think, with the right API design, we could allow for the key derivation algorithms to be retuned in security releases, *because* the state of the art of evolves (and because computers get faster). The passlib core APIs and hash formats are designed with precisely that problem in mind.
The trick is that even a suboptimal solution is a whole lot better than the next-to-nothing that many people do currently. At the moment, the available approaches are: 1. store plaintext passwords (eek) 2. store hashed unsalted passwords (vulnerable to rainbow tables) 3. store hashed salted passwords (vulnerable to massively parallel brute force attacks) 4. store tunable cost hashed salted passwords (reduces vulnerability to brute force, currently requires a third party library) Option 4 *is* the state of the art, it's just a matter of tinkering with the key derivation algorithm in response to advances in crypto improvements, as well as ramping up the tuning parameters over time to account for Moore's law. By making it as easy as possible for people to use Option 4 instead of one of the first 3, we increase the odds of people doing the right thing. A third party library like passlib can then focus on more dynamic things like: 1. Providing API compatible interfaces to 3rd party key derivation algorithms (e.g. bcrypt, or the accelerated PBKDF2 implementation in M2crypto), as well as to newer ones like scrypt 2. Providing convenient interfaces for reading and writing 3rd party hash storage formats (e.g. LDAP) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jun 10, 2012 at 2:36 PM, Paul Moore <p.f.moore@gmail.com> wrote:
To use that would need Armin's approval and support. So far he's not commented here.
Only if you want a different license than 3-clause BSD. P.S. I love this thread. Great suggestion. :) -- Devin

On Tue, Jun 12, 2012 at 2:34 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I wrote that quickly. I don't want a circular dependency or things that aren't well established standards in hashlib. I see hashlib as being for low level algorithms only (FIPS standards, etc) where fast implementations are available in most VM runtimes. hmac depends on hashlib therefore nothing in hashlib should ever depend on hmac. That doesn't prevent someone from deciding hmac shouldn't be a module of its own and moving it to live within hashlib some day but that would seem like needless API churn outside of a major language version change. -gps

I'd love to have a PBKDF2 implementation in the stdlib. My flufl.password module has an implementation donated by security expert Bob Fleck. Any insecure implementation bugs are solely blamed on me though. ;) http://bazaar.launchpad.net/~barry/flufl.password/trunk/view/head:/flufl/pas... The API is a little odd because it fits into the larger API for flufl.password, but if it's useful, I'd happily cleanup and donate the code for the stdlib. OTOH, I'd be just as happy (maybe more) to get rid of it in favor of a stdlib implementation. Cheers, -Barry

Am 13.06.2012 22:38, schrieb Barry Warsaw:
At first glance your implementation is vulnerable to side channel attacks because you aren't using a constant time equality function. Also you are using the least secure variant of PBKDF2 (SHA-1 instead of SHA-256 or SHA-512). At least you are using os.urandom() as source for the salt, which is usually fine. Passlib supports the LDAP variants, too. [1] Outside of LDAP the established notation is $pbkdf2-digest$rounds$salt$checksum. Christian [1] http://packages.python.org/passlib/lib/passlib.hash.ldap_pbkdf2_digest.html
participants (11)
-
Antoine Pitrou
-
Barry Warsaw
-
Christian Heimes
-
Devin Jeanpierre
-
Eli Collins
-
Gregory P. Smith
-
Guido van Rossum
-
Masklinn
-
Nick Coghlan
-
Paul Moore
-
Simon Sapin