Proposal: drop md5 for sha256
![](https://secure.gravatar.com/avatar/d7ff36e4d7c8060fadaa7c20f4a5649e.jpg?s=120&d=mm&r=g)
I would like to amend the spec. The hash column of RECORD should be 'sha256:' + urlsafe_b64encode(hashlib.sha256(data)) instead of the hopelessly obsolete md5. With a secure hash function, you can digitally sign RECORD. It would also make sense to allow RECORD to be omitted from RECORD.
![](https://secure.gravatar.com/avatar/b502021b659b0ea548b723e3b73e94d2.jpg?s=120&d=mm&r=g)
On 7/3/12 3:16 AM, Daniel Holth wrote:
I would like to amend the spec. The hash column of RECORD should be
'sha256:' + urlsafe_b64encode(hashlib.sha256(data))
instead of the hopelessly obsolete md5. With a secure hash function, you can digitally sign RECORD. The goal of the RECORD file is to make sure we know if a file was changed so installlers are aware of it when they want to remove the project for instance.
It was not really intended to be some kind of security against an attack -- unless you have attacks scenarri in mind ?
It would also make sense to allow RECORD to be omitted from RECORD.
why ? this file is part of the installation, and as said here : http://www.python.org/dev/peps/pep-0376/#record " Notice that the RECORD file can't contain a hash of itself and is just mentioned here" Cheers Tarek
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
![](https://secure.gravatar.com/avatar/60d06f7560160f3ce7aa3877596da63f.jpg?s=120&d=mm&r=g)
----- Original Message -----
I would like to amend the spec. The hash column of RECORD should be
'sha256:' + urlsafe_b64encode(hashlib.sha256(data))
instead of the hopelessly obsolete md5. With a secure hash function, you can digitally sign RECORD.
Signing packages does sound interesting, but what authority would sign them? The authors of the packages themselves?
It would also make sense to allow RECORD to be omitted from RECORD. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
-- Regards, Bohuslav "Slavek" Kabrda.
![](https://secure.gravatar.com/avatar/a918f60348c19aa9bc79f005397551be.jpg?s=120&d=mm&r=g)
Ideally the authors would sign them with GPG imo. Which is already possible. On Tuesday, July 3, 2012 at 3:42 AM, Bohuslav Kabrda wrote:
----- Original Message -----
I would like to amend the spec. The hash column of RECORD should be
'sha256:' + urlsafe_b64encode(hashlib.sha256(data))
instead of the hopelessly obsolete md5. With a secure hash function, you can digitally sign RECORD.
Signing packages does sound interesting, but what authority would sign them? The authors of the packages themselves?
It would also make sense to allow RECORD to be omitted from RECORD. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org (mailto:Distutils-SIG@python.org) http://mail.python.org/mailman/listinfo/distutils-sig
-- Regards, Bohuslav "Slavek" Kabrda. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org (mailto:Distutils-SIG@python.org) http://mail.python.org/mailman/listinfo/distutils-sig
![](https://secure.gravatar.com/avatar/b502021b659b0ea548b723e3b73e94d2.jpg?s=120&d=mm&r=g)
On 7/3/12 9:42 AM, Bohuslav Kabrda wrote:
----- Original Message -----
I would like to amend the spec. The hash column of RECORD should be
'sha256:' + urlsafe_b64encode(hashlib.sha256(data))
instead of the hopelessly obsolete md5. With a secure hash function, you can digitally sign RECORD.
Signing packages does sound interesting, but what authority would sign them? The authors of the packages themselves?
Notice that there's already a --sign feature in Distutils, using gpg. Hash in the RECORD file have nothing to do with making sure the package is originated from developer X. Its only purpose is to know if a file on the system was changed Cheers Tarek
![](https://secure.gravatar.com/avatar/a918f60348c19aa9bc79f005397551be.jpg?s=120&d=mm&r=g)
On Tuesday, July 3, 2012 at 3:45 AM, Tarek Ziadé wrote:
Hash in the RECORD file have nothing to do with making sure the package is originated from developer X. Its only purpose is to know if a file on the system was changed
Using sha256 would enable preventing someone from maliciously changing the file. Similar to how IDS systems capture hashes of binaries to compare against. Of course someone using the system like this would need to protect the filesystem storing the RECORD files accordingly. I also think that switching to sha256 is pretty low cost with minimal (no?) downsides with some possible upsides. Is there a reason to stay with md5?
![](https://secure.gravatar.com/avatar/b502021b659b0ea548b723e3b73e94d2.jpg?s=120&d=mm&r=g)
On 7/3/12 9:48 AM, Donald Stufft wrote:
On Tuesday, July 3, 2012 at 3:45 AM, Tarek Ziadé wrote:
Hash in the RECORD file have nothing to do with making sure the package is originated from developer X. Its only purpose is to know if a file on the system was changed
Using sha256 would enable preventing someone from maliciously changing the file.
If someone has access to that file, it means that he can also change the RECORD file so you have no way of trusting RECORD either.
Similar to how IDS systems capture hashes of binaries to compare against. Of course someone using the system like this would need to protect the filesystem storing the RECORD files accordingly.
I think that's the main issue - where are you going to put the RECORD file ?
I also think that switching to sha256 is pretty low cost with minimal (no?) downsides with some possible upsides. Is there a reason to stay with md5?
The file is two times smaller and faster to create, and md5 does its job at providing a hash for a file. I still fail to see a use case for stronger hashes Cheers Tarek
![](https://secure.gravatar.com/avatar/d7ff36e4d7c8060fadaa7c20f4a5649e.jpg?s=120&d=mm&r=g)
It's embarrassing to see md5 used for any reason. You go to pypi, and every download link has an md5 sum of the package, and you think "what is this archaic system that gives me a useless hash, implicated in such fine situations as the Flame malware and ever-improving attacks against md5?" It is irrelevant that it is "probably good enough for this limited use". You might as well use CRC32; it is much shorter. By re-using RECORD to include a secure hash of every file in an archive, you can sign all the files in the archive by signing RECORD, similar to how jars are signed. The digital signature is right there inside the archive, and if you decide you would rather have a .tar.xz instead of a .zip the signature is still valid.
![](https://secure.gravatar.com/avatar/5f3436ec210ecff3a5292931b72e037e.jpg?s=120&d=mm&r=g)
At Tue, 3 Jul 2012 07:14:43 -0400, Daniel Holth wrote:
It's embarrassing to see md5 used for any reason. You go to pypi, and every download link has an md5 sum of the package, and you think "what is this archaic system that gives me a useless hash, implicated in such fine situations as the Flame malware and ever-improving attacks against md5?" It is irrelevant that it is "probably good enough for this limited use". You might as well use CRC32; it is much shorter.
Yes, you're right, pypi could as well use CRC32. From a security perspective nothing would change, nor if we would switch to sha512, because there is no way to know whether the hash is correct. Without a trust path the hash is pretty useless except for verifying that the download isn't corrupted. And even if we would have trust paths, the md5 attacks are collision attacks, not preimage attacks. That means the security threat you're worrying about is that a developer uploads something to pypi with the intention of replacing that by something else with the same hash without anyone noticing. And although it is worthwhile to protect against such kind of things, you should also ask the question why you're running code from such a developer. And yes, attacks on md5 will only get better, so we should migrate to better hashes in the future. But if there is something to be embarrassed about, it's not the use of md5, but the lack of proper code signing and trust paths between developers. Kind regards, Jeroen Dekkers
![](https://secure.gravatar.com/avatar/d7ff36e4d7c8060fadaa7c20f4a5649e.jpg?s=120&d=mm&r=g)
And yes, attacks on md5 will only get better, so we should migrate to better hashes in the future. But if there is something to be embarrassed about, it's not the use of md5, but the lack of proper code signing and trust paths between developers.
I'm going to implement this except I will replace the sha256: with a sha256= There is simply no realistic drawback. Strong hashing is a prerequisite for a trust path, and you avoid the need to even think about why it is OK in this specific circumstance that a weak hash is being used.
![](https://secure.gravatar.com/avatar/b502021b659b0ea548b723e3b73e94d2.jpg?s=120&d=mm&r=g)
On 7/3/12 3:54 PM, Daniel Holth wrote:
I'm going to implement this except I will replace the sha256: with a sha256= There is simply no realistic drawback.
I am -1000 for any change to the RECORD file hashes in PEP 376 unless there's a clear use case.
Strong hashing is a prerequisite for a trust path, and you avoid the need to even think about why it is OK in this specific circumstance that a weak hash is being used. Sorry but I don't understand your use case.
What "strong", "weak" or "trust" means here ? The use case we have is: we need a check sum for every file, that's all. If you want to build a system where you can verify the origin of the files, you need something like a public/private key system. Which is what --sign is for. Otherwise you're just going to make hashes longer for no apparent reason. Cheers Tarek
![](https://secure.gravatar.com/avatar/eaa875d37f5e9ca7d663f1372efa1317.jpg?s=120&d=mm&r=g)
On Tue, Jul 3, 2012 at 8:48 AM, Jeroen Dekkers <jeroen@dekkers.ch> wrote:
And yes, attacks on md5 will only get better, so we should migrate to better hashes in the future.
No, because that's not what the RECORD hashes are for. It's not an intrusion detection system, it's an installer conflict and "oops I edited the wrong file" checker. People who are upset because md5 is low security are correctly understanding that this system *provides no security*. We are not promising ANY security, so *not* using a secure hash is actually preferable. The goal is data integrity against accidental overwrite by dumb installer tools (e.g. distutils) and accidental edits, not security against malicious tampering.
![](https://secure.gravatar.com/avatar/b502021b659b0ea548b723e3b73e94d2.jpg?s=120&d=mm&r=g)
On 7/3/12 4:32 PM, PJ Eby wrote:
On Tue, Jul 3, 2012 at 8:48 AM, Jeroen Dekkers <jeroen@dekkers.ch <mailto:jeroen@dekkers.ch>> wrote:
And yes, attacks on md5 will only get better, so we should migrate to better hashes in the future.
No, because that's not what the RECORD hashes are for. It's not an intrusion detection system, it's an installer conflict and "oops I edited the wrong file" checker.
People who are upset because md5 is low security are correctly understanding that this system *provides no security*. We are not promising ANY security, so *not* using a secure hash is actually preferable. The goal is data integrity against accidental overwrite by dumb installer tools (e.g. distutils) and accidental edits, not security against malicious tampering.
Yeah I don't really understand this debate over md5 hashes here. I suggest that we emphasis in PEP 376 the fact that the sole purpose is to have a checksum.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
![](https://secure.gravatar.com/avatar/b700adeedd4c123ba839b960ceec0ef1.jpg?s=120&d=mm&r=g)
Le 03/07/2012 10:53, Tarek Ziadé a écrit :
On 7/3/12 4:32 PM, PJ Eby wrote:
No, because that's not what the RECORD hashes are for. It's not an intrusion detection system, it's an installer conflict and "oops I edited the wrong file" checker.
People who are upset because md5 is low security are correctly understanding that this system *provides no security*. We are not promising ANY security, so *not* using a secure hash is actually preferable. The goal is data integrity against accidental overwrite by dumb installer tools (e.g. distutils) and accidental edits, not security against malicious tampering.
Exactly. Promises of false security do not help users.
Yeah I don't really understand this debate over md5 hashes here. I suggest that we emphasis in PEP 376 the fact that the sole purpose is to have a checksum.
Putting that on my list of editions for the PEPs! Cheers
![](https://secure.gravatar.com/avatar/d7ff36e4d7c8060fadaa7c20f4a5649e.jpg?s=120&d=mm&r=g)
I am just re-using record in wheel files so I can implement a verify function someday. Pay no attention to this backward-compatible change. You can use the checksum you prefer, and if it does not begin with hashfunc= then you know it's an md5. No discussion about adding provides-extra and the reserved extra names for python setup.py test? How about that the environment markers spec says you can use == but (naked version number) (4.0) is the only example given for "exactly this version"? And why is pkg-info called metadata now anyway? Daniel Holth On Jul 3, 2012, at 11:10 AM, Éric Araujo <merwok@netwok.org> wrote:
Le 03/07/2012 10:53, Tarek Ziadé a écrit :
On 7/3/12 4:32 PM, PJ Eby wrote:
No, because that's not what the RECORD hashes are for. It's not an intrusion detection system, it's an installer conflict and "oops I edited the wrong file" checker.
People who are upset because md5 is low security are correctly understanding that this system *provides no security*. We are not promising ANY security, so *not* using a secure hash is actually preferable. The goal is data integrity against accidental overwrite by dumb installer tools (e.g. distutils) and accidental edits, not security against malicious tampering.
Exactly. Promises of false security do not help users.
Yeah I don't really understand this debate over md5 hashes here. I suggest that we emphasis in PEP 376 the fact that the sole purpose is to have a checksum.
Putting that on my list of editions for the PEPs!
Cheers _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
![](https://secure.gravatar.com/avatar/b700adeedd4c123ba839b960ceec0ef1.jpg?s=120&d=mm&r=g)
No discussion about adding provides-extra and the reserved extra names for python setup.py test? How about that the environment markers spec says you can use == but (naked version number) It is a bit hard to comment on a very short proposal which proposes an implementation but not a specification part, and does not summarize the previous discussions on the topic (on distutils-sig and the-packaging-of-fellowship Google group). I know it’s a boring task to go through the archives to see the various proposals, but someone has to do it. On the principle I am +1 for build dependencies and +1 for test dependencies. I don’t know what Provides-Extra is about, nor what the last thing you talk about is.
And why is pkg-info called metadata now anyway? I thought this was explained in PEP 376: pkg-info was renamed to dist-info to respect Python terminology (see glossary in the PEP or the distutils docs, and please follow it), and the file became a directory with metadata and other stuff in various files, hence the METADATA file name.
Regards
![](https://secure.gravatar.com/avatar/d7ff36e4d7c8060fadaa7c20f4a5649e.jpg?s=120&d=mm&r=g)
Fair enough. I should write up provides-extra as an actual patch to the pep. It is just a way to express together with requires-dist what pip install pyramid[test] installs above just pip install pyramid, it is needed for setup tools to generate dist info without losing that information, and my distribute fork has an implementation. Are there really other proposals for including that package[extras] feature in the static metadata? I think that is the only part of my packaging work that needs to amend an existing spec, unless you want to opine on putting a build serial number in the versioning pep.
![](https://secure.gravatar.com/avatar/5f3436ec210ecff3a5292931b72e037e.jpg?s=120&d=mm&r=g)
At Tue, 3 Jul 2012 10:32:43 -0400, PJ Eby wrote:
On Tue, Jul 3, 2012 at 8:48 AM, Jeroen Dekkers <jeroen@dekkers.ch> wrote:
And yes, attacks on md5 will only get better, so we should migrate to better hashes in the future.
No, because that's not what the RECORD hashes are for. It's not an intrusion detection system, it's an installer conflict and "oops I edited the wrong file" checker.
Sorry for not being clear, but I totally agree. I was replying to the md5 on PyPI are embarrassing part and meant that we should migrate to use better hashes on PyPI in the future. Jeroen Dekkers
![](https://secure.gravatar.com/avatar/a918f60348c19aa9bc79f005397551be.jpg?s=120&d=mm&r=g)
On Tuesday, July 3, 2012 at 11:57 AM, Jeroen Dekkers wrote:
Sorry for not being clear, but I totally agree. I was replying to the md5 on PyPI are embarrassing part and meant that we should migrate to use better hashes on PyPI in the future.
Off topic for this list, but I would agree, and have attempted to have this happen but have been told that those hashes are not for security but only for a checksum and will not be changed.
![](https://secure.gravatar.com/avatar/60d06f7560160f3ce7aa3877596da63f.jpg?s=120&d=mm&r=g)
----- Original Message -----
On 7/3/12 9:42 AM, Bohuslav Kabrda wrote:
----- Original Message -----
I would like to amend the spec. The hash column of RECORD should be
'sha256:' + urlsafe_b64encode(hashlib.sha256(data))
instead of the hopelessly obsolete md5. With a secure hash function, you can digitally sign RECORD.
Signing packages does sound interesting, but what authority would sign them? The authors of the packages themselves?
Notice that there's already a --sign feature in Distutils, using gpg.
Ah, I didn't know about that.
Hash in the RECORD file have nothing to do with making sure the package is originated from developer X. Its only purpose is to know if a file on the system was changed
Well, since there is the --sign feature, I totally agree that md5 is sufficient for making checksums.
Cheers Tarek _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
-- Regards, Bohuslav "Slavek" Kabrda.
participants (7)
-
Bohuslav Kabrda
-
Daniel Holth
-
Donald Stufft
-
Jeroen Dekkers
-
PJ Eby
-
Tarek Ziadé
-
Éric Araujo