On 28.05.2016 23:13, Christian Heimes wrote:
On 2016-05-27 14:41, M.-A. Lemburg wrote:
On 27.05.2016 22:58, Ryan Gonzalez wrote:
On May 27, 2016 3:04 PM, "Victor Stinner" <victor.stinner@gmail.com> wrote:
Le vendredi 27 mai 2016, M.-A. Lemburg <mal@egenix.com> a écrit :
The current patch is 1.2MB for SHA-3 - that's pretty heavy for just a few hash functions, which aren't in any wide spread use yet and probably won't be for quite a few years ahead.
Oh wow, it's so fat? Why is it so big? Can't we use a lighter version?
The stark majority of the patch is Lib/test/vectors/sha3_224.txt, which seems to be (as the file path implies) just test data. A whopping >1k LOC of really long hashes.
Right. There's about 1MB test data in the patch, but even without that data, the patch adds more than 6400 lines of code.
The KeccakCodePackage is rather large. I already removed all unnecessary files and modified some files so more code is shared between 32 and 64bit optimized variants. Please keep in mind that the KCP contains multiple implementations with different optimizations for CPU architectures. I already removed the ARM NEON optimization. I also don't get your obsession with lines of code. The gzip and expat are far bigger than the KeccakCodePackage.
For a small piece of code, it's fine to have a copy in the stdlib, but for larger chunks such as this one, I think we ought to consider alternative options, since I don't think it's good to have to carry around this baggage forever. OpenSSL will eventually have good enough support for what most Python users will need from these new hash functions. That's why I think it's better to have a discussion of whether we need to full package in the stdlib or better only provide limited support built into the stdlib and refer people to PyPI packages for things that you don't need every day. Regarding the stories for zlib and expat, I only remember that expat was essentially unmaintained when we added it and the existing version at the time had known bugs (but could be wrong). For zlib, I have no clue as to why we have a copy in the stdlib. That lib is available on all systems nowadays. Perhaps it wasn't when we added it; don't remember. If so, it's a good example of why adding copies to the stdlib is not such a good idea :-)
If we add this now, there should at least be an exit strategy to remove the code again, when OpenSSL ships with the same code, IMO.
Aside: BLAKE2 has already landed in OpenSSL 1.1.0:
https://github.com/openssl/openssl/tree/master/crypto/blake2
Except BLAKE2 in OpenSSL is severely castrated and tailored towards a very limited use case. The implementation does not support any of the useful advanced features like keyed hashing (MAC), salt, personalization, tree hashing and variable hash length.
I bet that the use cases they put into OpenSSL is what most people will eventually use, so essentially the same reasoning we use for putting stuff into the stdlib. Besides, the code just landed in OpenSSL. It's likely they'll continue to optimize it and possibly also add the variants they left out initially. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 29 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/