New hash algorithms: SHA3, SHAKE, BLAKE2, truncated SHA512
Hi everybody, I have three hashing-related patches for Python 3.6 that are waiting for review. Altogether the three patches add ten new hash algorithms to the hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256), BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256). SHA-3 / SHAKE: https://bugs.python.org/issue16113 BLAKE2: https://bugs.python.org/issue26798 SHA512/224 / SHA512/256: https://bugs.python.org/issue26834 I like to push the patches during the sprints at PyCon. Please assist with reviews. Regards, Christian
On May 25, 2016, at 3:29 AM, Christian Heimes <christian@python.org> wrote:
I have three hashing-related patches for Python 3.6 that are waiting for review. Altogether the three patches add ten new hash algorithms to the hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256), BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256).
Do we really need ten? I don't think the standard library is the place to offer all variants of hashing. And we should avoid getting in a cycle of "this was just released by NIST" and "nobody uses that one anymore". Is any one of them an emergent best practice (i.e. starting to be commonly used in network protocols because it is better, faster, stronger, etc)? Your last message on https://bugs.python.org/issue16113 suggests that these aren't essential and that there is room for debate about whether some of them are standard-library worthy (i.e. we will have them around forever). Raymond
On May 27, 2016, at 12:54 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On May 25, 2016, at 3:29 AM, Christian Heimes <christian@python.org> wrote:
I have three hashing-related patches for Python 3.6 that are waiting for review. Altogether the three patches add ten new hash algorithms to the hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256), BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256).
Do we really need ten? I don't think the standard library is the place to offer all variants of hashing. And we should avoid getting in a cycle of "this was just released by NIST" and "nobody uses that one anymore". Is any one of them an emergent best practice (i.e. starting to be commonly used in network protocols because it is better, faster, stronger, etc)?
Your last message on https://bugs.python.org/issue16113 suggests that these aren't essential and that there is room for debate about whether some of them are standard-library worthy (i.e. we will have them around forever).
I think that adding sha3 here is a net positive. While there isn’t a huge amount of things using it today, that’s largely because it’s fairly new— It’s a NIST standard so it won’t be long until things are using it. It would be surprising to me to be able to use sha1 and sha2 from the standard library, but not sha3. SHAKE is really just SHA3 with some additional tweaks to the parameters. I think if you’re adding SHA3 it’s pretty easy to also add these, though I don’t think that it’s as important as adding SHA3 itself. BLAKE2 is an interesting one, because while SHA3 is a NIST standard (so it’s going to gain adoption because of that), BLAKE2 is at least as strong as SHA3 but is better in many ways, particularly in speed— it’s actually faster than MD5 while being as secure as SHA3. This one I think is a good one to have in the standard library as well because it is all around a really great hash and a lot of things are starting to be built on top of it. In particularly I’d like to use this in PyPI and pip- but I can’t unless it’s in the standard library. — Donald Stufft
Le 27 mai 2016 12:05 PM, "Donald Stufft" <donald@stufft.io> a écrit :
BLAKE2 is an interesting one, because while SHA3 is a NIST standard (so it’s going to gain adoption because of that), BLAKE2 is at least as strong as SHA3 but is better in many ways, particularly in speed— it’s actually faster than MD5 while being as secure as SHA3.
BLAKE2 was part of the SHA3 competition and it was in finalists. The SHA3 competition is interesting because each algorithm is deeply tested and analyzed by many teams all around the world. Obvious vulnerabilities are quickly found. The advantage of putting SHA3 and BLAKE2 in the stdlib is that they have a different design. I don't expect that two designs have the same vulnerabilities, but I'm not ax expert :-) SHA3 (Keccak) is based on a new sponge construction: https://en.m.wikipedia.org/wiki/SHA-3 BLAKE is based on ChaCha: https://en.m.wikipedia.org/wiki/BLAKE_(hash_function) https://en.m.wikipedia.org/wiki/Salsa20#ChaCha_variant Victor
On 2016-05-27 03:44, Victor Stinner wrote:
Le 27 mai 2016 12:05 PM, "Donald Stufft" <donald@stufft.io <mailto:donald@stufft.io>> a écrit :
BLAKE2 is an interesting one, because while SHA3 is a NIST standard (so it’s going to gain adoption because of that), BLAKE2 is at least as strong as SHA3 but is better in many ways, particularly in speed— it’s actually faster than MD5 while being as secure as SHA3.
BLAKE2 was part of the SHA3 competition and it was in finalists. The SHA3 competition is interesting because each algorithm is deeply tested and analyzed by many teams all around the world. Obvious vulnerabilities are quickly found.
Thanks Victor, minor correction, BLAKE was a finalist in the SHA3 competition, not BLAKE2. BLAKE2 is an improved version of BLAKE2 with additional features. Christian
On 27.05.2016 06:54, Raymond Hettinger wrote:
On May 25, 2016, at 3:29 AM, Christian Heimes <christian@python.org> wrote:
I have three hashing-related patches for Python 3.6 that are waiting for review. Altogether the three patches add ten new hash algorithms to the hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256), BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256).
Do we really need ten? I don't think the standard library is the place to offer all variants of hashing. And we should avoid getting in a cycle of "this was just released by NIST" and "nobody uses that one anymore". Is any one of them an emergent best practice (i.e. starting to be commonly used in network protocols because it is better, faster, stronger, etc)?
Your last message on https://bugs.python.org/issue16113 suggests that these aren't essential and that there is room for debate about whether some of them are standard-library worthy (i.e. we will have them around forever).
I can understand your eagerness to get this landed, since it's been 4 years since work started, but I think we should wait with the addition until OpenSSL has them: https://github.com/openssl/openssl/issues/439 The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead. IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package: https://github.com/bjornedstrom/python-sha3 Perhaps you could join forces with Björn to create a standard SHA-3 standalone package on PyPI based on your two variants which we could recommend to people in the docs ?! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On May 27, 2016, at 6:54 AM, M.-A. Lemburg <mal@egenix.com> wrote:
IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package:
Even now, hashlib doesn’t rely on OpenSSL if I recall, I mean it will use it if OpenSSL is available but otherwise it has internal implementations too. — Donald Stufft
On 27.05.2016 13:03, Donald Stufft wrote:
On May 27, 2016, at 6:54 AM, M.-A. Lemburg <mal@egenix.com> wrote:
IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package:
Even now, hashlib doesn’t rely on OpenSSL if I recall, I mean it will use it if OpenSSL is available but otherwise it has internal implementations too.
I know, but still don't think that's a good idea. It makes sense in case you don't want to carry around OpenSSL all the time, but how often does that happen nowadays ? BTW: If I recall correctly, those hash implementations predate the deeper support for OpenSSL we now have in Python. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
OpenSSL sucks. Python would only have to bundle a reference implementation of the new hash algorithm(s), and unlike TLS suites they tend to just work. BLAKE2 is important, since it removes the last objection to replacing MD5 - speed - that has made it hard for cryptography fans to convince MD5 users to upgrade. On Fri, May 27, 2016 at 7:13 AM M.-A. Lemburg <mal@egenix.com> wrote:
On 27.05.2016 13:03, Donald Stufft wrote:
On May 27, 2016, at 6:54 AM, M.-A. Lemburg <mal@egenix.com> wrote:
IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package:
Even now, hashlib doesn’t rely on OpenSSL if I recall, I mean it will use it if OpenSSL is available but otherwise it has internal
implementations
too.
I know, but still don't think that's a good idea. It makes sense in case you don't want to carry around OpenSSL all the time, but how often does that happen nowadays ?
BTW: If I recall correctly, those hash implementations predate the deeper support for OpenSSL we now have in Python.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, May 27 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com
On 05/27/2016 11:31 AM, Daniel Holth wrote:
BLAKE2 is important, since it removes the last objection to replacing MD5 - speed - that has made it hard for cryptography fans to convince MD5 users to upgrade.
I have had to stick to MD5 for performance reasons (2 seconds in MD5 or 9.6 seconds in SHA256, IIRC) in scenarios that did not require an SHA*. Having BLAKE2 around wouldn't be a necessity, but if it shipped with newer versions of Python eventually there would be a commit switching the underlying hash function.
, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Anything added to the stdlib now will be in py3.6+, yes? Which won't be in widespread use for quite a few years yet, either. So if ( and that's a big if) it's possible to anticipate what will be in widespread use in a couple years, getting it in now would be a good thing. -CHB
IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package:
https://github.com/bjornedstrom/python-sha3
Perhaps you could join forces with Björn to create a standard SHA-3 standalone package on PyPI based on your two variants which we could recommend to people in the docs ?!
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, May 27 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
On 27.05.2016 17:44, Chris Barker - NOAA Federal wrote:
, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Anything added to the stdlib now will be in py3.6+, yes?
Which won't be in widespread use for quite a few years yet, either.
So if ( and that's a big if) it's possible to anticipate what will be in widespread use in a couple years, getting it in now would be a good thing.
You cut away the important part of what I said: "The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, ..." If people want to use the hashes earlier, this is already possible via a separate package, so we're not delaying their use. It is clear that SHA-3 will get more traction in coming years (*), but I'm pretty sure that OpenSSL will have good implementations by the time people will actively start using the new hash algorithm and then hashlib will automatically make that available (hashlib uses the OpenSSL EVP abstraction, so will be able to use any new algorithms added to OpenSSL). However, if we add the reference implementation now, we'd then be left with 1.2MB unnecessary code in the stdlib. The question is not so much: is SHA-3 useful or not, it's whether we want to maintain this forever going forward or not. (*) People are just now starting to move from SHA-1 to SHA-2 and SHA-2 was standardized in 2001. Python received SHA-2 support in 2006. So there's plenty of time to decide :-)
-CHB
IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package:
https://github.com/bjornedstrom/python-sha3
Perhaps you could join forces with Björn to create a standard SHA-3 standalone package on PyPI based on your two variants which we could recommend to people in the docs ?!
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, May 27 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On Fri, May 27, 2016 at 9:35 AM, M.-A. Lemburg <mal@egenix.com> wrote:
So if ( and that's a big if) it's possible to anticipate what will be in widespread use in a couple years, getting it in now would be a good thing.
You cut away the important part of what I said: "The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, ..."
If people want to use the hashes earlier, this is already possible via a separate package, so we're not delaying their use.
That's true for ANY addition to the stdlib -- it could always be made available in a third party lib. (unless you want to use it in another part of the stdlib...)
It is clear that SHA-3 will get more traction in coming years (*), but I'm pretty sure that OpenSSL will have good implementations by the time people will actively start using the new hash algorithm and then hashlib will automatically make that available (hashlib uses the OpenSSL EVP abstraction, so will be able to use any new algorithms added to OpenSSL).
However, if we add the reference implementation now, we'd then be left with 1.2MB unnecessary code in the stdlib.
I'm probably showing my ignorance here, but couldn't we swap in the OpenSSL implementation when that becomes available? -CHB (*) People are just now starting to move from SHA-1 to SHA-2
and SHA-2 was standardized in 2001. Python received SHA-2 support in 2006. So there's plenty of time to decide :-)
can't deny the history, nor the inertia -- but that doesn't make it a good thing... -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 27.05.2016 18:41, Chris Barker wrote:
On Fri, May 27, 2016 at 9:35 AM, M.-A. Lemburg <mal@egenix.com> wrote:
So if ( and that's a big if) it's possible to anticipate what will be in widespread use in a couple years, getting it in now would be a good thing.
You cut away the important part of what I said: "The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, ..."
If people want to use the hashes earlier, this is already possible via a separate package, so we're not delaying their use.
That's true for ANY addition to the stdlib -- it could always be made available in a third party lib. (unless you want to use it in another part of the stdlib...)
Well, any addition for which someone already wrote a package, but yes...
It is clear that SHA-3 will get more traction in coming years (*), but I'm pretty sure that OpenSSL will have good implementations by the time people will actively start using the new hash algorithm and then hashlib will automatically make that available (hashlib uses the OpenSSL EVP abstraction, so will be able to use any new algorithms added to OpenSSL).
However, if we add the reference implementation now, we'd then be left with 1.2MB unnecessary code in the stdlib.
I'm probably showing my ignorance here, but couldn't we swap in the OpenSSL implementation when that becomes available?
We could, but only if we don't expose separate interfaces for the hashes and not add them to hashlib. hashlib.algorithms hashlib.algorithms_guaranteed
-CHB
(*) People are just now starting to move from SHA-1 to SHA-2
and SHA-2 was standardized in 2001. Python received SHA-2 support in 2006. So there's plenty of time to decide :-)
can't deny the history, nor the inertia -- but that doesn't make it a good thing...
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/mal%40egenix.com
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On 2016-05-27 09:41, Chris Barker wrote:
I'm probably showing my ignorance here, but couldn't we swap in the OpenSSL implementation when that becomes available?
No, not any time soon. As soon as we guarantee SHA3 support we have to keep our own implementation for a couple of additional releases. We can drop our own SHA3 code as soon as all supported OpenSSL versions have SHA3. For example when OpenSSL 1.2.0 is going to have SHA3 support, we must wait until OpenSSL 1.1 and 1.0.2 are no longer supported by OpenSSL. Christian
On May 28, 2016, at 5:01 PM, Christian Heimes <christian@python.org> wrote:
No, not any time soon. As soon as we guarantee SHA3 support we have to keep our own implementation for a couple of additional releases. We can drop our own SHA3 code as soon as all supported OpenSSL versions have SHA3.
It still will be needed for as long as it’s possible to build Python without OpenSSL. — Donald Stufft
But you could choose which implementation to use at compile time based on the autoconf output, right? On Sat, May 28, 2016 at 2:01 PM, Christian Heimes <christian@python.org> wrote:
On 2016-05-27 09:41, Chris Barker wrote:
I'm probably showing my ignorance here, but couldn't we swap in the OpenSSL implementation when that becomes available?
No, not any time soon. As soon as we guarantee SHA3 support we have to keep our own implementation for a couple of additional releases. We can drop our own SHA3 code as soon as all supported OpenSSL versions have SHA3.
For example when OpenSSL 1.2.0 is going to have SHA3 support, we must wait until OpenSSL 1.1 and 1.0.2 are no longer supported by OpenSSL.
Christian
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On May 28, 2016, at 5:06 PM, Guido van Rossum <guido@python.org> wrote:
But you could choose which implementation to use at compile time based on the autoconf output, right?
I think we should follow what hashlib already does. If we want to change the way it works that's fine but these hashes shouldn't be special. They should work the way that all the other standard hashes in hashlib work.
On 2016-05-28 14:06, Guido van Rossum wrote:
But you could choose which implementation to use at compile time based on the autoconf output, right?
We compile all modules and then let hashlib decide which implementation is used. hashlib prefers OpenSSL but falls back to our builtin modules. For MD5, SHA1 and SHA2 OpenSSL's implementation has better performance (up to twice the speed).
Le vendredi 27 mai 2016, M.-A. Lemburg <mal@egenix.com> a écrit :
The curent patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Oh wow, it's so fat? Why is it so big? Can't we use a lighter version? Victor
On May 27, 2016 3:04 PM, "Victor Stinner" <victor.stinner@gmail.com> wrote:
Le vendredi 27 mai 2016, M.-A. Lemburg <mal@egenix.com> a écrit :
The curent patch is 1.2MB for SHA-3 - that's pretty heavy for just
a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Oh wow, it's so fat? Why is it so big? Can't we use a lighter version?
The stark majority of the patch is Lib/test/vectors/sha3_224.txt, which seems to be (as the file path implies) just test data. A whopping >1k LOC of really long hashes.
Victor
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/
On 27.05.2016 22:58, Ryan Gonzalez wrote:
On May 27, 2016 3:04 PM, "Victor Stinner" <victor.stinner@gmail.com> wrote:
Le vendredi 27 mai 2016, M.-A. Lemburg <mal@egenix.com> a écrit :
The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Oh wow, it's so fat? Why is it so big? Can't we use a lighter version?
The stark majority of the patch is Lib/test/vectors/sha3_224.txt, which seems to be (as the file path implies) just test data. A whopping >1k LOC of really long hashes.
Right. There's about 1MB test data in the patch, but even without that data, the patch adds more than 6400 lines of code. If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO. Aside: BLAKE2 has already landed in OpenSSL 1.1.0: https://github.com/openssl/openssl/tree/master/crypto/blake2 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On May 27, 2016, at 5:41 PM, M.-A. Lemburg <mal@egenix.com> wrote:
If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO.
I think it is a clear win to have the fallback implementations in cases where people either don’t have OpenSSL or don’t have a new enough OpenSSL for those implementations. Not having the fallback just makes it more difficult for people to rely on those hash functions. — Donald Stufft
On 27.05.2016 23:46, Donald Stufft wrote:
On May 27, 2016, at 5:41 PM, M.-A. Lemburg <mal@egenix.com> wrote:
If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO.
I think it is a clear win to have the fallback implementations in cases where people either don’t have OpenSSL or don’t have a new enough OpenSSL for those implementations. Not having the fallback just makes it more difficult for people to rely on those hash functions.
This will only be needed once the stdlib itself starts requiring support for some of these hashes and for that we could add a pure Python implementation, eg. https://github.com/coruus/py-keccak In all other cases, you can simply add the support via a package such as Björn's or Christian's. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On Fri, May 27, 2016 at 3:08 PM, M.-A. Lemburg <mal@egenix.com> wrote:
On 27.05.2016 23:46, Donald Stufft wrote:
On May 27, 2016, at 5:41 PM, M.-A. Lemburg <mal@egenix.com> wrote:
If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO.
I think it is a clear win to have the fallback implementations in cases where people either don’t have OpenSSL or don’t have a new enough OpenSSL for those implementations. Not having the fallback just makes it more difficult for people to rely on those hash functions.
This will only be needed once the stdlib itself starts requiring support for some of these hashes and for that we could add a pure Python implementation, eg.
https://github.com/coruus/py-keccak
In all other cases, you can simply add the support via a package such as Björn's or Christian's.
SHA-3 and BLAKE are extremely widely accepted standards, our users will expect them, and they're significant improvements over all the current hashes in the algorithms_guaranteed list. If we demote them to second-class support (by making them only available in some builds, or using a slow pure Python implementation), then we'll be encouraging users to use inferior hashes. We shouldn't do this without a very good reason, and I don't see anything very convincing here... by all means drop the megabyte of test data, but why does it matter how many lines of code the algorithm is? No python developer will ever have to look at it -- hash code by its nature is *very* low maintenance (it either computes the right function or it doesn't, and the right answer never changes). And in unlikely case where some terrible unexpected bug is discovered then the only maintenance needed will be to delete the current impl and drop-in whatever the new fixed one is. So +1 to adding SHA-3 and BLAKE to algorithms_guaranteed. -n -- Nathaniel J. Smith -- https://vorpus.org
On 05/27/2016 07:52 PM, Nathaniel Smith wrote:
If we demote them to second-class support (by making them only available in some builds, or using a slow pure Python implementation), then we'll be encouraging users to use inferior hashes. We shouldn't do this without a very good reason.
I agree. And I really think we shouldn't even ship pure Python implementations of these hashing algorithms. I am fairly confident that these algorithms would be prohibitively slow if written in pure Python.
On 2016-05-27 15:52, Nathaniel Smith wrote:
On Fri, May 27, 2016 at 3:08 PM, M.-A. Lemburg <mal@egenix.com> wrote:
On 27.05.2016 23:46, Donald Stufft wrote:
On May 27, 2016, at 5:41 PM, M.-A. Lemburg <mal@egenix.com> wrote:
If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO.
I think it is a clear win to have the fallback implementations in cases where people either don’t have OpenSSL or don’t have a new enough OpenSSL for those implementations. Not having the fallback just makes it more difficult for people to rely on those hash functions.
This will only be needed once the stdlib itself starts requiring support for some of these hashes and for that we could add a pure Python implementation, eg.
https://github.com/coruus/py-keccak
In all other cases, you can simply add the support via a package such as Björn's or Christian's.
SHA-3 and BLAKE are extremely widely accepted standards, our users will expect them, and they're significant improvements over all the current hashes in the algorithms_guaranteed list. If we demote them to second-class support (by making them only available in some builds, or using a slow pure Python implementation), then we'll be encouraging users to use inferior hashes. We shouldn't do this without a very good reason, and I don't see anything very convincing here... by all means drop the megabyte of test data, but why does it matter how many lines of code the algorithm is? No python developer will ever have to look at it -- hash code by its nature is *very* low maintenance (it either computes the right function or it doesn't, and the right answer never changes). And in unlikely case where some terrible unexpected bug is discovered then the only maintenance needed will be to delete the current impl and drop-in whatever the new fixed one is.
So +1 to adding SHA-3 and BLAKE to algorithms_guaranteed.
Thanks Nathaniel, my patches don't add SHA3 and BLAKE2 to algorithms_guaranteed because Python still supports C89 platforms without a 64 bit integer type. Theoretically 64bit ints are not required except for BLAKE2b. Since Trent's snakebite.org is dead I don't have access to these old platforms any more. Christian
Python 3.5 requires a 64 bit signed integer to build. Search for _PyTime type in pytime.h ;-)
On 2016-05-27 14:41, M.-A. Lemburg wrote:
On 27.05.2016 22:58, Ryan Gonzalez wrote:
On May 27, 2016 3:04 PM, "Victor Stinner" <victor.stinner@gmail.com> wrote:
Le vendredi 27 mai 2016, M.-A. Lemburg <mal@egenix.com> a écrit :
The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Oh wow, it's so fat? Why is it so big? Can't we use a lighter version?
The stark majority of the patch is Lib/test/vectors/sha3_224.txt, which seems to be (as the file path implies) just test data. A whopping >1k LOC of really long hashes.
Right. There's about 1MB test data in the patch, but even without that data, the patch adds more than 6400 lines of code.
The KeccakCodePackage is rather large. I already removed all unnecessary files and modified some files so more code is shared between 32 and 64bit optimized variants. Please keep in mind that the KCP contains multiple implementations with different optimizations for CPU architectures. I already removed the ARM NEON optimization. I also don't get your obsession with lines of code. The gzip and expat are far bigger than the KeccakCodePackage.
If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO.
Aside: BLAKE2 has already landed in OpenSSL 1.1.0:
https://github.com/openssl/openssl/tree/master/crypto/blake2
Except BLAKE2 in OpenSSL is severely castrated and tailored towards a very limited use case. The implementation does not support any of the useful advanced features like keyed hashing (MAC), salt, personalization, tree hashing and variable hash length.
On 28.05.2016 23:13, Christian Heimes wrote:
On 2016-05-27 14:41, M.-A. Lemburg wrote:
On 27.05.2016 22:58, Ryan Gonzalez wrote:
On May 27, 2016 3:04 PM, "Victor Stinner" <victor.stinner@gmail.com> wrote:
Le vendredi 27 mai 2016, M.-A. Lemburg <mal@egenix.com> a écrit :
The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Oh wow, it's so fat? Why is it so big? Can't we use a lighter version?
The stark majority of the patch is Lib/test/vectors/sha3_224.txt, which seems to be (as the file path implies) just test data. A whopping >1k LOC of really long hashes.
Right. There's about 1MB test data in the patch, but even without that data, the patch adds more than 6400 lines of code.
The KeccakCodePackage is rather large. I already removed all unnecessary files and modified some files so more code is shared between 32 and 64bit optimized variants. Please keep in mind that the KCP contains multiple implementations with different optimizations for CPU architectures. I already removed the ARM NEON optimization. I also don't get your obsession with lines of code. The gzip and expat are far bigger than the KeccakCodePackage.
For a small piece of code, it's fine to have a copy in the stdlib, but for larger chunks such as this one, I think we ought to consider alternative options, since I don't think it's good to have to carry around this baggage forever. OpenSSL will eventually have good enough support for what most Python users will need from these new hash functions. That's why I think it's better to have a discussion of whether we need to full package in the stdlib or better only provide limited support built into the stdlib and refer people to PyPI packages for things that you don't need every day. Regarding the stories for zlib and expat, I only remember that expat was essentially unmaintained when we added it and the existing version at the time had known bugs (but could be wrong). For zlib, I have no clue as to why we have a copy in the stdlib. That lib is available on all systems nowadays. Perhaps it wasn't when we added it; don't remember. If so, it's a good example of why adding copies to the stdlib is not such a good idea :-)
If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO.
Aside: BLAKE2 has already landed in OpenSSL 1.1.0:
https://github.com/openssl/openssl/tree/master/crypto/blake2
Except BLAKE2 in OpenSSL is severely castrated and tailored towards a very limited use case. The implementation does not support any of the useful advanced features like keyed hashing (MAC), salt, personalization, tree hashing and variable hash length.
I bet that the use cases they put into OpenSSL is what most people will eventually use, so essentially the same reasoning we use for putting stuff into the stdlib. Besides, the code just landed in OpenSSL. It's likely they'll continue to optimize it and possibly also add the variants they left out initially. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 29 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On 2016-05-27 03:54, M.-A. Lemburg wrote:
On 27.05.2016 06:54, Raymond Hettinger wrote:
On May 25, 2016, at 3:29 AM, Christian Heimes <christian@python.org> wrote:
I have three hashing-related patches for Python 3.6 that are waiting for review. Altogether the three patches add ten new hash algorithms to the hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256), BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256).
Do we really need ten? I don't think the standard library is the place to offer all variants of hashing. And we should avoid getting in a cycle of "this was just released by NIST" and "nobody uses that one anymore". Is any one of them an emergent best practice (i.e. starting to be commonly used in network protocols because it is better, faster, stronger, etc)?
Your last message on https://bugs.python.org/issue16113 suggests that these aren't essential and that there is room for debate about whether some of them are standard-library worthy (i.e. we will have them around forever).
I can understand your eagerness to get this landed, since it's been 4 years since work started, but I think we should wait with the addition until OpenSSL has them:
https://github.com/openssl/openssl/issues/439
The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
About 1 MB of the 1.2 MB are test vectors for SHA3. Strictly speaking the test vectors are not required.
IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package:
https://github.com/bjornedstrom/python-sha3
Perhaps you could join forces with Björn to create a standard SHA-3 standalone package on PyPI based on your two variants which we could recommend to people in the docs ?!
I have been maintaining my own SHA3 module for couple of years. A month ago I moved my code to github and ported it to the new Keccak Code Package. The standalone package uses the same code as my patch but also provides the old Keccak hashes and works on Python 2.7. https://github.com/tiran/pysha3 https://pypi.python.org/pypi/pysha3
On Sat, May 28, 2016, 13:58 Christian Heimes <christian@python.org> wrote:
On 27.05.2016 06:54, Raymond Hettinger wrote:
On May 25, 2016, at 3:29 AM, Christian Heimes <christian@python.org>
wrote:
I have three hashing-related patches for Python 3.6 that are waiting
for
review. Altogether the three patches add ten new hash algorithms to the hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256), BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256).
Do we really need ten? I don't think the standard library is the place to offer all variants of hashing. And we should avoid getting in a cycle of "this was just released by NIST" and "nobody uses that one anymore". Is any one of them an emergent best practice (i.e. starting to be commonly used in network protocols because it is better, faster, stronger, etc)?
Your last message on https://bugs.python.org/issue16113 suggests that
On 2016-05-27 03:54, M.-A. Lemburg wrote: these aren't essential and that there is room for debate about whether some of them are standard-library worthy (i.e. we will have them around forever).
I can understand your eagerness to get this landed, since it's been 4 years since work started, but I think we should wait with the addition until OpenSSL has them:
https://github.com/openssl/openssl/issues/439
The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
About 1 MB of the 1.2 MB are test vectors for SHA3. Strictly speaking the test vectors are not required.
We can always make the test vector file an external download like we do for some of the codec tests. -brett
IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package:
https://github.com/bjornedstrom/python-sha3
Perhaps you could join forces with Björn to create a standard SHA-3 standalone package on PyPI based on your two variants which we could recommend to people in the docs ?!
I have been maintaining my own SHA3 module for couple of years. A month ago I moved my code to github and ported it to the new Keccak Code Package. The standalone package uses the same code as my patch but also provides the old Keccak hashes and works on Python 2.7.
https://github.com/tiran/pysha3 https://pypi.python.org/pypi/pysha3 _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
On 2016-05-25 12:29, Christian Heimes wrote:
Hi everybody,
I have three hashing-related patches for Python 3.6 that are waiting for review. Altogether the three patches add ten new hash algorithms to the hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256), BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256).
SHA-3 / SHAKE: https://bugs.python.org/issue16113 BLAKE2: https://bugs.python.org/issue26798 SHA512/224 / SHA512/256: https://bugs.python.org/issue26834
I like to push the patches during the sprints at PyCon. Please assist with reviews.
Hi, I have unassigned myself from the tickets and will no longer pursue the addition of new crypto hash algorithms. I might try again when blake2 and sha3 are more widely adopted and the opposition from other core contributors has diminished. Acceptance is simply not high enough to be worth the trouble. Kind regards, Christian
participants (13)
-
Bernardo Sulzbach
-
Brett Cannon
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Christian Heimes
-
Daniel Holth
-
Donald Stufft
-
Guido van Rossum
-
M.-A. Lemburg
-
Nathaniel Smith
-
Raymond Hettinger
-
Ryan Gonzalez
-
Victor Stinner