Proposal: drop md5 for sha256
On hosts configured for compliance with U.S. Federal Information Processing Standard (FIPS) 140-2 <http://csrc.nist.gov/publications/fips/fips140-2/fips1402.pdf>, like those in some banks and, yes, the U.S. Department of Defense, cryptographic modules (such as OpenSSL, which underlies hashlib) are not allowed to calculate MD5 digests, because MD5 is no longer a FIPS Approved digest algorithm. I know no one is trying here to lean on MD5 for security, but the standard says nothing about the reason why you're using MD5: just that you can't. No one expects a digest algorithm to fail, and Python 2.x may not have been fixed to check for that before being frozen <https://bugzilla.redhat.com/show_bug.cgi?id=746118#c3>, so if you run an MD5 checksum on a FIPS-compliant system with an unpatched Python 2.x, the Python interpreter will segfault. (Ruby, too, had this problem and was itself only recently fixed, <http://bugs.ruby-lang.org/issues/4944>.) I have to configure hosts in accordance with FIPS 140-2, so the more places I can get rid of MD5, the less headaches I have. -- Jared Jennings, RHCE, Network Admin, SURVICE Engineering Co.
On Tue, Jul 3, 2012 at 7:33 PM, Jennings, Jared L CTR USAF AFMC 46 SK/CCI < jared.jennings.ctr@eglin.af.mil> wrote:
On hosts configured for compliance with U.S. Federal Information Processing Standard (FIPS) 140-2 <http://csrc.nist.gov/publications/fips/fips140-2/fips1402.pdf>, like those in some banks and, yes, the U.S. Department of Defense, cryptographic modules (such as OpenSSL, which underlies hashlib) are not allowed to calculate MD5 digests, because MD5 is no longer a FIPS Approved digest algorithm.
So if it's not a cryptographic module, it's okay? ;-) I know no one is trying here to lean on MD5 for security, but the
standard says nothing about the reason why you're using MD5: just that you can't.
No one expects a digest algorithm to fail, and Python 2.x may not have been fixed to check for that before being frozen <https://bugzilla.redhat.com/show_bug.cgi?id=746118#c3>, so if you run an MD5 checksum on a FIPS-compliant system with an unpatched Python 2.x, the Python interpreter will segfault. (Ruby, too, had this problem and was itself only recently fixed, <http://bugs.ruby-lang.org/issues/4944>.)
I have to configure hosts in accordance with FIPS 140-2, so the more places I can get rid of MD5, the less headaches I have.
If we replace it with something else, then I suggest we replace it with something that's even MORE braindead than md5 so that nobody will mistake it for a secure hash. Otherwise, we will have this exact same problem all over again when the replacement "secure" hash is disabled by a newer version of FIPS. The other option is simply to forego a checksum altogether and assume same size = same file. Honestly, I don't remember why we cared about detecting such modifications in the first place: neither PEP 376 nor 262 explain why, and 376 doesn't explain why it went with md5 instead of sha1 (as in PEP 262).
On Jul 3, 2012, at 5:50 PM, PJ Eby <pje@telecommunity.com> wrote:
Otherwise, we will have this exact same problem all over again when the replacement "secure" hash is disabled by a newer version of FIPS.
Or, you know, somebody could maintain the dang software and automate the process of producing these hashes. I am slightly baffled by the tone of this thread, like the hash algorithm needs to be set in stone forever. There's a reason that most software treats hashes as pluggable: new algorithms come out every few years, you have to expect that your choice will be obsoleted for some reason (not necessarily just security!) in the future. Granted, there's no real security in this case, but why not use a hash algorithm with less probability of collision? -glyph
On Tuesday, July 3, 2012 at 9:29 PM, Glyph wrote:
Or, you know, somebody could maintain the dang software and automate the process of producing these hashes. I am slightly baffled by the tone of this thread, like the hash algorithm needs to be set in stone forever. There's a reason that most software treats hashes as pluggable: new algorithms come out every few years, you have to expect that your choice will be obsoleted for some reason (not necessarily just security!) in the future. Granted, there's no real security in this case, but why not use a hash algorithm with less probability of collision?
I tend to agree wrt to hashes and I have an outstanding pull request against pip to make it treat hashes as pluggable at least ;)
On 7/4/12 2:50 AM, PJ Eby wrote:
On Tue, Jul 3, 2012 at 7:33 PM, Jennings, Jared L CTR USAF AFMC 46 SK/CCI <jared.jennings.ctr@eglin.af.mil <mailto:jared.jennings.ctr@eglin.af.mil>> wrote:
On hosts configured for compliance with U.S. Federal Information Processing Standard (FIPS) 140-2 <http://csrc.nist.gov/publications/fips/fips140-2/fips1402.pdf>, like those in some banks and, yes, the U.S. Department of Defense, cryptographic modules (such as OpenSSL, which underlies hashlib) are not allowed to calculate MD5 digests, because MD5 is no longer a FIPS Approved digest algorithm.
So if it's not a cryptographic module, it's okay? ;-)
I know no one is trying here to lean on MD5 for security, but the standard says nothing about the reason why you're using MD5: just that you can't.
No one expects a digest algorithm to fail, and Python 2.x may not have been fixed to check for that before being frozen <https://bugzilla.redhat.com/show_bug.cgi?id=746118#c3>, so if you run an MD5 checksum on a FIPS-compliant system with an unpatched Python 2.x, the Python interpreter will segfault. (Ruby, too, had this problem and was itself only recently fixed, <http://bugs.ruby-lang.org/issues/4944>.)
I have to configure hosts in accordance with FIPS 140-2, so the more places I can get rid of MD5, the less headaches I have.
If we replace it with something else, then I suggest we replace it with something that's even MORE braindead than md5 so that nobody will mistake it for a secure hash. Otherwise, we will have this exact same problem all over again when the replacement "secure" hash is disabled by a newer version of FIPS.
The other option is simply to forego a checksum altogether and assume same size = same file. Honestly, I don't remember why we cared about detecting such modifications in the first place: neither PEP 376 nor 262 explain why, and 376 doesn't explain why it went with md5 instead of sha1 (as in PEP 262).
I wanted to be able to offer a way for installers to detect that a file was changed to avoid deleting it for instance, and issue a warning to the user -- or maybe give a chance to the installer to save a copy of the file somewhere. I picked md5 because I wanted it brain dead and could not imagine that would be an issue somehow. Maybe zlib.crc32 would be a better choice. If we remove the hash, oh well. no big deal I guess. If an installer wants to add this feature it can maintain hashes itself.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
On 7/4/12 9:35 AM, Tarek Ziadé wrote:
On 7/4/12 2:50 AM, PJ Eby wrote:
On Tue, Jul 3, 2012 at 7:33 PM, Jennings, Jared L CTR USAF AFMC 46 SK/CCI <jared.jennings.ctr@eglin.af.mil <mailto:jared.jennings.ctr@eglin.af.mil>> wrote:
On hosts configured for compliance with U.S. Federal Information Processing Standard (FIPS) 140-2 <http://csrc.nist.gov/publications/fips/fips140-2/fips1402.pdf>, like those in some banks and, yes, the U.S. Department of Defense, cryptographic modules (such as OpenSSL, which underlies hashlib) are not allowed to calculate MD5 digests, because MD5 is no longer a FIPS Approved digest algorithm.
So if it's not a cryptographic module, it's okay? ;-)
I know no one is trying here to lean on MD5 for security, but the standard says nothing about the reason why you're using MD5: just that you can't.
No one expects a digest algorithm to fail, and Python 2.x may not have been fixed to check for that before being frozen <https://bugzilla.redhat.com/show_bug.cgi?id=746118#c3>, so if you run an MD5 checksum on a FIPS-compliant system with an unpatched Python 2.x, the Python interpreter will segfault. (Ruby, too, had this problem and was itself only recently fixed, <http://bugs.ruby-lang.org/issues/4944>.)
I have to configure hosts in accordance with FIPS 140-2, so the more places I can get rid of MD5, the less headaches I have.
If we replace it with something else, then I suggest we replace it with something that's even MORE braindead than md5 so that nobody will mistake it for a secure hash. Otherwise, we will have this exact same problem all over again when the replacement "secure" hash is disabled by a newer version of FIPS.
The other option is simply to forego a checksum altogether and assume same size = same file. Honestly, I don't remember why we cared about detecting such modifications in the first place: neither PEP 376 nor 262 explain why, and 376 doesn't explain why it went with md5 instead of sha1 (as in PEP 262).
I wanted to be able to offer a way for installers to detect that a file was changed to avoid deleting it for instance, and issue a warning to the user -- or maybe give a chance to the installer to save a copy of the file somewhere.
I picked md5 because I wanted it brain dead and could not imagine that would be an issue somehow. Maybe zlib.crc32 would be a better choice.
Oh let's do a fletcher checksum ! This one should be universally authorized by any system grabbed on the web: def fletcher(path, modulus=255): with open(path) as f: data = f.read() numbers = map(ord, data) a = b = 0 for number in numbers: a += number b += a a %= modulus b %= modulus return hex((a << 8) | b)[2:].upper().zfill(4)
If we remove the hash, oh well. no big deal I guess. If an installer wants to add this feature it can maintain hashes itself.
_______________________________________________ Distutils-SIG maillist -Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
On 7/4/12 9:47 AM, Tarek Ziadé wrote:
Oh let's do a fletcher checksum ! This one should be universally authorized by any system
Better version: http://tarek.pastebin.mozilla.org/1690480 takes 4 seconds on my MBA for a 40 mb file, so it seems fast enough since our PyPI limit is 8 Mb.
On Wed, Jul 4, 2012 at 5:07 AM, Tarek Ziadé <tarek@ziade.org> wrote:
On 7/4/12 9:47 AM, Tarek Ziadé wrote:
Oh let's do a fletcher checksum ! This one should be universally
authorized by any system
Better version:
http://tarek.pastebin.mozilla.**org/1690480<http://tarek.pastebin.mozilla.org/1690480>
takes 4 seconds on my MBA for a 40 mb file, so it seems fast enough since our PyPI limit is 8 Mb.
It's too bad Python's built-in hash() isn't guaranteed consistent across versions and implementations, otherwise we could just use that! (Or more precisely, a spec for how to combine the Python hashes of specified-size blocks.)
Every hash function is fast enough. Sha256: > 100 megabytes per second on a single core. Size of one of my normal virtualenv: < 50. The proposal for record just makes the hash pluggable, so if you have a slow machine and a very fast disk and verifying distributions is taking too long then you can do something about it. I think the skein hash is even faster than md5 while also being modern, but Jared surely can't use it unless it becomes SHA-3. But if you really want to save some actual time, use binary packages :-) Pip install lxml - 1m 51s Pip install -f file:///temp/wheels lxml - 27s I am not sure why pip is so slow for me. The lxml binary package install could take as little as 0.1 seconds if pip wasn't consulting the net. RPM hashes installed files. It is mostly to avoid accidentally deleting edited configs, but you can "rpm verify" for other reasons if you want.
On 07/04/2012 12:51 PM, Daniel Holth wrote:
Pip install lxml - 1m 51s Pip install -f file:///temp/wheels lxml - 27s
I am not sure why pip is so slow for me. The lxml binary package install could take as little as 0.1 seconds if pip wasn't consulting the net.
If you don't want pip to consult the network, use the --no-index flag along with --find-links. Carl
On Tue, Jul 03, 2012 at 06:33:08PM -0500, Jennings, Jared L CTR USAF AFMC 46 SK/CCI wrote:
On hosts configured for compliance with U.S. Federal Information Processing Standard (FIPS) 140-2 <http://csrc.nist.gov/publications/fips/fips140-2/fips1402.pdf>, like those in some banks and, yes, the U.S. Department of Defense, cryptographic modules (such as OpenSSL, which underlies hashlib) are not allowed to calculate MD5 digests, because MD5 is no longer a FIPS Approved digest algorithm.
I know no one is trying here to lean on MD5 for security, but the standard says nothing about the reason why you're using MD5: just that you can't.
No one expects a digest algorithm to fail, and Python 2.x may not have been fixed to check for that before being frozen <https://bugzilla.redhat.com/show_bug.cgi?id=746118#c3>, so if you run an MD5 checksum on a FIPS-compliant system with an unpatched Python 2.x, the Python interpreter will segfault. (Ruby, too, had this problem and was itself only recently fixed, <http://bugs.ruby-lang.org/issues/4944>.)
I have to configure hosts in accordance with FIPS 140-2, so the more places I can get rid of MD5, the less headaches I have.
I've just had to look into this for a bug in a package on Fedora and it's not all bad but also not all good. I believe that in current python2 and python3 (including soon to be released python-3.3), if it's compiled against openssl, the md5 hash constructor will SIGABRT when in FIPS mode. If it's compiled against the internal md5 code, it will ignore FIPS mode. Dave Malcolm has a patch in the tracker that hasn't yet been approved and merged that allows one to pass a flag to the hash constructor that says that the call is not being used for cryptographic purposes and then the constructor will work even in FIPS mode. I've seen no indication in the tracker that this would be applied to future python-2.7.x releases, but it could be backported by individual distributors of python2 (for instance, Linux distributions). A version of the patch is presently applied to the Fedora Linux 17 versions of python2 and python3 if someone is curious. Note that openssl itself allows the use of MD5 in FIPS mode under a similar strategy. So I'm not entirely certain that the standard forbids use of MD5 for non-cryptographic purposes. -Toshio
participants (8)
-
Carl Meyer
-
Daniel Holth
-
Donald Stufft
-
Glyph
-
Jennings, Jared L CTR USAF AFMC 46 SK/CCI
-
PJ Eby
-
Tarek Ziadé
-
Toshio Kuratomi