New hash algorithms: SHA3, SHAKE, BLAKE2, truncated SHA512

Hi everybody, I have three hashing-related patches for Python 3.6 that are waiting for review. Altogether the three patches add ten new hash algorithms to the hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256), BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256). SHA-3 / SHAKE: https://bugs.python.org/issue16113 BLAKE2: https://bugs.python.org/issue26798 SHA512/224 / SHA512/256: https://bugs.python.org/issue26834 I like to push the patches during the sprints at PyCon. Please assist with reviews. Regards, Christian

Do we really need ten? I don't think the standard library is the place to offer all variants of hashing. And we should avoid getting in a cycle of "this was just released by NIST" and "nobody uses that one anymore". Is any one of them an emergent best practice (i.e. starting to be commonly used in network protocols because it is better, faster, stronger, etc)? Your last message on https://bugs.python.org/issue16113 suggests that these aren't essential and that there is room for debate about whether some of them are standard-library worthy (i.e. we will have them around forever). Raymond

I think that adding sha3 here is a net positive. While there isn’t a huge amount of things using it today, that’s largely because it’s fairly new— It’s a NIST standard so it won’t be long until things are using it. It would be surprising to me to be able to use sha1 and sha2 from the standard library, but not sha3. SHAKE is really just SHA3 with some additional tweaks to the parameters. I think if you’re adding SHA3 it’s pretty easy to also add these, though I don’t think that it’s as important as adding SHA3 itself. BLAKE2 is an interesting one, because while SHA3 is a NIST standard (so it’s going to gain adoption because of that), BLAKE2 is at least as strong as SHA3 but is better in many ways, particularly in speed— it’s actually faster than MD5 while being as secure as SHA3. This one I think is a good one to have in the standard library as well because it is all around a really great hash and a lot of things are starting to be built on top of it. In particularly I’d like to use this in PyPI and pip- but I can’t unless it’s in the standard library. — Donald Stufft

Le 27 mai 2016 12:05 PM, "Donald Stufft" <donald@stufft.io> a écrit :
BLAKE2 was part of the SHA3 competition and it was in finalists. The SHA3 competition is interesting because each algorithm is deeply tested and analyzed by many teams all around the world. Obvious vulnerabilities are quickly found. The advantage of putting SHA3 and BLAKE2 in the stdlib is that they have a different design. I don't expect that two designs have the same vulnerabilities, but I'm not ax expert :-) SHA3 (Keccak) is based on a new sponge construction: https://en.m.wikipedia.org/wiki/SHA-3 BLAKE is based on ChaCha: https://en.m.wikipedia.org/wiki/BLAKE_(hash_function) https://en.m.wikipedia.org/wiki/Salsa20#ChaCha_variant Victor

On 27.05.2016 06:54, Raymond Hettinger wrote:
I can understand your eagerness to get this landed, since it's been 4 years since work started, but I think we should wait with the addition until OpenSSL has them: https://github.com/openssl/openssl/issues/439 The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead. IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package: https://github.com/bjornedstrom/python-sha3 Perhaps you could join forces with Björn to create a standard SHA-3 standalone package on PyPI based on your two variants which we could recommend to people in the docs ?! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 27.05.2016 13:03, Donald Stufft wrote:
I know, but still don't think that's a good idea. It makes sense in case you don't want to carry around OpenSSL all the time, but how often does that happen nowadays ? BTW: If I recall correctly, those hash implementations predate the deeper support for OpenSSL we now have in Python. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

OpenSSL sucks. Python would only have to bundle a reference implementation of the new hash algorithm(s), and unlike TLS suites they tend to just work. BLAKE2 is important, since it removes the last objection to replacing MD5 - speed - that has made it hard for cryptography fans to convince MD5 users to upgrade. On Fri, May 27, 2016 at 7:13 AM M.-A. Lemburg <mal@egenix.com> wrote:

On 05/27/2016 11:31 AM, Daniel Holth wrote:
I have had to stick to MD5 for performance reasons (2 seconds in MD5 or 9.6 seconds in SHA256, IIRC) in scenarios that did not require an SHA*. Having BLAKE2 around wouldn't be a necessity, but if it shipped with newer versions of Python eventually there would be a commit switching the underlying hash function.

, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Anything added to the stdlib now will be in py3.6+, yes? Which won't be in widespread use for quite a few years yet, either. So if ( and that's a big if) it's possible to anticipate what will be in widespread use in a couple years, getting it in now would be a good thing. -CHB

On 27.05.2016 17:44, Chris Barker - NOAA Federal wrote:
You cut away the important part of what I said: "The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, ..." If people want to use the hashes earlier, this is already possible via a separate package, so we're not delaying their use. It is clear that SHA-3 will get more traction in coming years (*), but I'm pretty sure that OpenSSL will have good implementations by the time people will actively start using the new hash algorithm and then hashlib will automatically make that available (hashlib uses the OpenSSL EVP abstraction, so will be able to use any new algorithms added to OpenSSL). However, if we add the reference implementation now, we'd then be left with 1.2MB unnecessary code in the stdlib. The question is not so much: is SHA-3 useful or not, it's whether we want to maintain this forever going forward or not. (*) People are just now starting to move from SHA-1 to SHA-2 and SHA-2 was standardized in 2001. Python received SHA-2 support in 2006. So there's plenty of time to decide :-)
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Fri, May 27, 2016 at 9:35 AM, M.-A. Lemburg <mal@egenix.com> wrote:
That's true for ANY addition to the stdlib -- it could always be made available in a third party lib. (unless you want to use it in another part of the stdlib...)
I'm probably showing my ignorance here, but couldn't we swap in the OpenSSL implementation when that becomes available? -CHB (*) People are just now starting to move from SHA-1 to SHA-2
and SHA-2 was standardized in 2001. Python received SHA-2 support in 2006. So there's plenty of time to decide :-)
can't deny the history, nor the inertia -- but that doesn't make it a good thing... -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 27.05.2016 18:41, Chris Barker wrote:
Well, any addition for which someone already wrote a package, but yes...
We could, but only if we don't expose separate interfaces for the hashes and not add them to hashlib. hashlib.algorithms hashlib.algorithms_guaranteed
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 2016-05-27 09:41, Chris Barker wrote:
I'm probably showing my ignorance here, but couldn't we swap in the OpenSSL implementation when that becomes available?
No, not any time soon. As soon as we guarantee SHA3 support we have to keep our own implementation for a couple of additional releases. We can drop our own SHA3 code as soon as all supported OpenSSL versions have SHA3. For example when OpenSSL 1.2.0 is going to have SHA3 support, we must wait until OpenSSL 1.1 and 1.0.2 are no longer supported by OpenSSL. Christian

On 2016-05-28 14:06, Guido van Rossum wrote:
But you could choose which implementation to use at compile time based on the autoconf output, right?
We compile all modules and then let hashlib decide which implementation is used. hashlib prefers OpenSSL but falls back to our builtin modules. For MD5, SHA1 and SHA2 OpenSSL's implementation has better performance (up to twice the speed).

On May 27, 2016 3:04 PM, "Victor Stinner" <victor.stinner@gmail.com> wrote:
The stark majority of the patch is Lib/test/vectors/sha3_224.txt, which seems to be (as the file path implies) just test data. A whopping >1k LOC of really long hashes.
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/

On 27.05.2016 22:58, Ryan Gonzalez wrote:
Right. There's about 1MB test data in the patch, but even without that data, the patch adds more than 6400 lines of code. If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO. Aside: BLAKE2 has already landed in OpenSSL 1.1.0: https://github.com/openssl/openssl/tree/master/crypto/blake2 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

I think it is a clear win to have the fallback implementations in cases where people either don’t have OpenSSL or don’t have a new enough OpenSSL for those implementations. Not having the fallback just makes it more difficult for people to rely on those hash functions. — Donald Stufft

On 27.05.2016 23:46, Donald Stufft wrote:
This will only be needed once the stdlib itself starts requiring support for some of these hashes and for that we could add a pure Python implementation, eg. https://github.com/coruus/py-keccak In all other cases, you can simply add the support via a package such as Björn's or Christian's. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Fri, May 27, 2016 at 3:08 PM, M.-A. Lemburg <mal@egenix.com> wrote:
SHA-3 and BLAKE are extremely widely accepted standards, our users will expect them, and they're significant improvements over all the current hashes in the algorithms_guaranteed list. If we demote them to second-class support (by making them only available in some builds, or using a slow pure Python implementation), then we'll be encouraging users to use inferior hashes. We shouldn't do this without a very good reason, and I don't see anything very convincing here... by all means drop the megabyte of test data, but why does it matter how many lines of code the algorithm is? No python developer will ever have to look at it -- hash code by its nature is *very* low maintenance (it either computes the right function or it doesn't, and the right answer never changes). And in unlikely case where some terrible unexpected bug is discovered then the only maintenance needed will be to delete the current impl and drop-in whatever the new fixed one is. So +1 to adding SHA-3 and BLAKE to algorithms_guaranteed. -n -- Nathaniel J. Smith -- https://vorpus.org

On 2016-05-27 15:52, Nathaniel Smith wrote:
Thanks Nathaniel, my patches don't add SHA3 and BLAKE2 to algorithms_guaranteed because Python still supports C89 platforms without a 64 bit integer type. Theoretically 64bit ints are not required except for BLAKE2b. Since Trent's snakebite.org is dead I don't have access to these old platforms any more. Christian

Python 3.5 requires a 64 bit signed integer to build. Search for _PyTime type in pytime.h ;-)

On 2016-05-27 14:41, M.-A. Lemburg wrote:
The KeccakCodePackage is rather large. I already removed all unnecessary files and modified some files so more code is shared between 32 and 64bit optimized variants. Please keep in mind that the KCP contains multiple implementations with different optimizations for CPU architectures. I already removed the ARM NEON optimization. I also don't get your obsession with lines of code. The gzip and expat are far bigger than the KeccakCodePackage.
Except BLAKE2 in OpenSSL is severely castrated and tailored towards a very limited use case. The implementation does not support any of the useful advanced features like keyed hashing (MAC), salt, personalization, tree hashing and variable hash length.

On 28.05.2016 23:13, Christian Heimes wrote:
For a small piece of code, it's fine to have a copy in the stdlib, but for larger chunks such as this one, I think we ought to consider alternative options, since I don't think it's good to have to carry around this baggage forever. OpenSSL will eventually have good enough support for what most Python users will need from these new hash functions. That's why I think it's better to have a discussion of whether we need to full package in the stdlib or better only provide limited support built into the stdlib and refer people to PyPI packages for things that you don't need every day. Regarding the stories for zlib and expat, I only remember that expat was essentially unmaintained when we added it and the existing version at the time had known bugs (but could be wrong). For zlib, I have no clue as to why we have a copy in the stdlib. That lib is available on all systems nowadays. Perhaps it wasn't when we added it; don't remember. If so, it's a good example of why adding copies to the stdlib is not such a good idea :-)
I bet that the use cases they put into OpenSSL is what most people will eventually use, so essentially the same reasoning we use for putting stuff into the stdlib. Besides, the code just landed in OpenSSL. It's likely they'll continue to optimize it and possibly also add the variants they left out initially. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 29 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 2016-05-27 03:54, M.-A. Lemburg wrote:
About 1 MB of the 1.2 MB are test vectors for SHA3. Strictly speaking the test vectors are not required.
I have been maintaining my own SHA3 module for couple of years. A month ago I moved my code to github and ported it to the new Keccak Code Package. The standalone package uses the same code as my patch but also provides the old Keccak hashes and works on Python 2.7. https://github.com/tiran/pysha3 https://pypi.python.org/pypi/pysha3

On 2016-05-25 12:29, Christian Heimes wrote:
Hi, I have unassigned myself from the tickets and will no longer pursue the addition of new crypto hash algorithms. I might try again when blake2 and sha3 are more widely adopted and the opposition from other core contributors has diminished. Acceptance is simply not high enough to be worth the trouble. Kind regards, Christian

Do we really need ten? I don't think the standard library is the place to offer all variants of hashing. And we should avoid getting in a cycle of "this was just released by NIST" and "nobody uses that one anymore". Is any one of them an emergent best practice (i.e. starting to be commonly used in network protocols because it is better, faster, stronger, etc)? Your last message on https://bugs.python.org/issue16113 suggests that these aren't essential and that there is room for debate about whether some of them are standard-library worthy (i.e. we will have them around forever). Raymond

I think that adding sha3 here is a net positive. While there isn’t a huge amount of things using it today, that’s largely because it’s fairly new— It’s a NIST standard so it won’t be long until things are using it. It would be surprising to me to be able to use sha1 and sha2 from the standard library, but not sha3. SHAKE is really just SHA3 with some additional tweaks to the parameters. I think if you’re adding SHA3 it’s pretty easy to also add these, though I don’t think that it’s as important as adding SHA3 itself. BLAKE2 is an interesting one, because while SHA3 is a NIST standard (so it’s going to gain adoption because of that), BLAKE2 is at least as strong as SHA3 but is better in many ways, particularly in speed— it’s actually faster than MD5 while being as secure as SHA3. This one I think is a good one to have in the standard library as well because it is all around a really great hash and a lot of things are starting to be built on top of it. In particularly I’d like to use this in PyPI and pip- but I can’t unless it’s in the standard library. — Donald Stufft

Le 27 mai 2016 12:05 PM, "Donald Stufft" <donald@stufft.io> a écrit :
BLAKE2 was part of the SHA3 competition and it was in finalists. The SHA3 competition is interesting because each algorithm is deeply tested and analyzed by many teams all around the world. Obvious vulnerabilities are quickly found. The advantage of putting SHA3 and BLAKE2 in the stdlib is that they have a different design. I don't expect that two designs have the same vulnerabilities, but I'm not ax expert :-) SHA3 (Keccak) is based on a new sponge construction: https://en.m.wikipedia.org/wiki/SHA-3 BLAKE is based on ChaCha: https://en.m.wikipedia.org/wiki/BLAKE_(hash_function) https://en.m.wikipedia.org/wiki/Salsa20#ChaCha_variant Victor

On 27.05.2016 06:54, Raymond Hettinger wrote:
I can understand your eagerness to get this landed, since it's been 4 years since work started, but I think we should wait with the addition until OpenSSL has them: https://github.com/openssl/openssl/issues/439 The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead. IMO, relying on OpenSSL is a better strategy than providing (and maintaining) our own compatibility versions. Until OpenSSL has them, people can use Björn's package: https://github.com/bjornedstrom/python-sha3 Perhaps you could join forces with Björn to create a standard SHA-3 standalone package on PyPI based on your two variants which we could recommend to people in the docs ?! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 27.05.2016 13:03, Donald Stufft wrote:
I know, but still don't think that's a good idea. It makes sense in case you don't want to carry around OpenSSL all the time, but how often does that happen nowadays ? BTW: If I recall correctly, those hash implementations predate the deeper support for OpenSSL we now have in Python. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

OpenSSL sucks. Python would only have to bundle a reference implementation of the new hash algorithm(s), and unlike TLS suites they tend to just work. BLAKE2 is important, since it removes the last objection to replacing MD5 - speed - that has made it hard for cryptography fans to convince MD5 users to upgrade. On Fri, May 27, 2016 at 7:13 AM M.-A. Lemburg <mal@egenix.com> wrote:

On 05/27/2016 11:31 AM, Daniel Holth wrote:
I have had to stick to MD5 for performance reasons (2 seconds in MD5 or 9.6 seconds in SHA256, IIRC) in scenarios that did not require an SHA*. Having BLAKE2 around wouldn't be a necessity, but if it shipped with newer versions of Python eventually there would be a commit switching the underlying hash function.

, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Anything added to the stdlib now will be in py3.6+, yes? Which won't be in widespread use for quite a few years yet, either. So if ( and that's a big if) it's possible to anticipate what will be in widespread use in a couple years, getting it in now would be a good thing. -CHB

On 27.05.2016 17:44, Chris Barker - NOAA Federal wrote:
You cut away the important part of what I said: "The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, ..." If people want to use the hashes earlier, this is already possible via a separate package, so we're not delaying their use. It is clear that SHA-3 will get more traction in coming years (*), but I'm pretty sure that OpenSSL will have good implementations by the time people will actively start using the new hash algorithm and then hashlib will automatically make that available (hashlib uses the OpenSSL EVP abstraction, so will be able to use any new algorithms added to OpenSSL). However, if we add the reference implementation now, we'd then be left with 1.2MB unnecessary code in the stdlib. The question is not so much: is SHA-3 useful or not, it's whether we want to maintain this forever going forward or not. (*) People are just now starting to move from SHA-1 to SHA-2 and SHA-2 was standardized in 2001. Python received SHA-2 support in 2006. So there's plenty of time to decide :-)
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Fri, May 27, 2016 at 9:35 AM, M.-A. Lemburg <mal@egenix.com> wrote:
That's true for ANY addition to the stdlib -- it could always be made available in a third party lib. (unless you want to use it in another part of the stdlib...)
I'm probably showing my ignorance here, but couldn't we swap in the OpenSSL implementation when that becomes available? -CHB (*) People are just now starting to move from SHA-1 to SHA-2
and SHA-2 was standardized in 2001. Python received SHA-2 support in 2006. So there's plenty of time to decide :-)
can't deny the history, nor the inertia -- but that doesn't make it a good thing... -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 27.05.2016 18:41, Chris Barker wrote:
Well, any addition for which someone already wrote a package, but yes...
We could, but only if we don't expose separate interfaces for the hashes and not add them to hashlib. hashlib.algorithms hashlib.algorithms_guaranteed
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 2016-05-27 09:41, Chris Barker wrote:
I'm probably showing my ignorance here, but couldn't we swap in the OpenSSL implementation when that becomes available?
No, not any time soon. As soon as we guarantee SHA3 support we have to keep our own implementation for a couple of additional releases. We can drop our own SHA3 code as soon as all supported OpenSSL versions have SHA3. For example when OpenSSL 1.2.0 is going to have SHA3 support, we must wait until OpenSSL 1.1 and 1.0.2 are no longer supported by OpenSSL. Christian

On 2016-05-28 14:06, Guido van Rossum wrote:
But you could choose which implementation to use at compile time based on the autoconf output, right?
We compile all modules and then let hashlib decide which implementation is used. hashlib prefers OpenSSL but falls back to our builtin modules. For MD5, SHA1 and SHA2 OpenSSL's implementation has better performance (up to twice the speed).

On May 27, 2016 3:04 PM, "Victor Stinner" <victor.stinner@gmail.com> wrote:
The stark majority of the patch is Lib/test/vectors/sha3_224.txt, which seems to be (as the file path implies) just test data. A whopping >1k LOC of really long hashes.
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/

On 27.05.2016 22:58, Ryan Gonzalez wrote:
Right. There's about 1MB test data in the patch, but even without that data, the patch adds more than 6400 lines of code. If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO. Aside: BLAKE2 has already landed in OpenSSL 1.1.0: https://github.com/openssl/openssl/tree/master/crypto/blake2 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

I think it is a clear win to have the fallback implementations in cases where people either don’t have OpenSSL or don’t have a new enough OpenSSL for those implementations. Not having the fallback just makes it more difficult for people to rely on those hash functions. — Donald Stufft

On 27.05.2016 23:46, Donald Stufft wrote:
This will only be needed once the stdlib itself starts requiring support for some of these hashes and for that we could add a pure Python implementation, eg. https://github.com/coruus/py-keccak In all other cases, you can simply add the support via a package such as Björn's or Christian's. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Fri, May 27, 2016 at 3:08 PM, M.-A. Lemburg <mal@egenix.com> wrote:
SHA-3 and BLAKE are extremely widely accepted standards, our users will expect them, and they're significant improvements over all the current hashes in the algorithms_guaranteed list. If we demote them to second-class support (by making them only available in some builds, or using a slow pure Python implementation), then we'll be encouraging users to use inferior hashes. We shouldn't do this without a very good reason, and I don't see anything very convincing here... by all means drop the megabyte of test data, but why does it matter how many lines of code the algorithm is? No python developer will ever have to look at it -- hash code by its nature is *very* low maintenance (it either computes the right function or it doesn't, and the right answer never changes). And in unlikely case where some terrible unexpected bug is discovered then the only maintenance needed will be to delete the current impl and drop-in whatever the new fixed one is. So +1 to adding SHA-3 and BLAKE to algorithms_guaranteed. -n -- Nathaniel J. Smith -- https://vorpus.org

On 2016-05-27 15:52, Nathaniel Smith wrote:
Thanks Nathaniel, my patches don't add SHA3 and BLAKE2 to algorithms_guaranteed because Python still supports C89 platforms without a 64 bit integer type. Theoretically 64bit ints are not required except for BLAKE2b. Since Trent's snakebite.org is dead I don't have access to these old platforms any more. Christian

Python 3.5 requires a 64 bit signed integer to build. Search for _PyTime type in pytime.h ;-)

On 2016-05-27 14:41, M.-A. Lemburg wrote:
The KeccakCodePackage is rather large. I already removed all unnecessary files and modified some files so more code is shared between 32 and 64bit optimized variants. Please keep in mind that the KCP contains multiple implementations with different optimizations for CPU architectures. I already removed the ARM NEON optimization. I also don't get your obsession with lines of code. The gzip and expat are far bigger than the KeccakCodePackage.
Except BLAKE2 in OpenSSL is severely castrated and tailored towards a very limited use case. The implementation does not support any of the useful advanced features like keyed hashing (MAC), salt, personalization, tree hashing and variable hash length.

On 28.05.2016 23:13, Christian Heimes wrote:
For a small piece of code, it's fine to have a copy in the stdlib, but for larger chunks such as this one, I think we ought to consider alternative options, since I don't think it's good to have to carry around this baggage forever. OpenSSL will eventually have good enough support for what most Python users will need from these new hash functions. That's why I think it's better to have a discussion of whether we need to full package in the stdlib or better only provide limited support built into the stdlib and refer people to PyPI packages for things that you don't need every day. Regarding the stories for zlib and expat, I only remember that expat was essentially unmaintained when we added it and the existing version at the time had known bugs (but could be wrong). For zlib, I have no clue as to why we have a copy in the stdlib. That lib is available on all systems nowadays. Perhaps it wasn't when we added it; don't remember. If so, it's a good example of why adding copies to the stdlib is not such a good idea :-)
I bet that the use cases they put into OpenSSL is what most people will eventually use, so essentially the same reasoning we use for putting stuff into the stdlib. Besides, the code just landed in OpenSSL. It's likely they'll continue to optimize it and possibly also add the variants they left out initially. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 29 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 2016-05-27 03:54, M.-A. Lemburg wrote:
About 1 MB of the 1.2 MB are test vectors for SHA3. Strictly speaking the test vectors are not required.
I have been maintaining my own SHA3 module for couple of years. A month ago I moved my code to github and ported it to the new Keccak Code Package. The standalone package uses the same code as my patch but also provides the old Keccak hashes and works on Python 2.7. https://github.com/tiran/pysha3 https://pypi.python.org/pypi/pysha3

On 2016-05-25 12:29, Christian Heimes wrote:
Hi, I have unassigned myself from the tickets and will no longer pursue the addition of new crypto hash algorithms. I might try again when blake2 and sha3 are more widely adopted and the opposition from other core contributors has diminished. Acceptance is simply not high enough to be worth the trouble. Kind regards, Christian
participants (13)
-
Bernardo Sulzbach
-
Brett Cannon
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Christian Heimes
-
Daniel Holth
-
Donald Stufft
-
Guido van Rossum
-
M.-A. Lemburg
-
Nathaniel Smith
-
Raymond Hettinger
-
Ryan Gonzalez
-
Victor Stinner