Publish better than md5sums of Python builds?
Hi,
Someone on Mastodon had me noticed that:
=> https://www.python.org/downloads/release/python-392/
gives the md5 sum of Python builds, and that we should probably do better.
What about sha256? Has it been discussed already?
Bests,
This was raised in python.org github issues
https://github.com/python/pythondotorg/issues/1227 https://github.com/python/pythondotorg/issues/1512
Regards, Karthikeyan S
On Tue, Mar 16, 2021, 7:30 PM Julien Palard via python-committers < python-committers@python.org> wrote:
Hi,
Someone on Mastodon had me noticed that:
=> https://www.python.org/downloads/release/python-392/
gives the md5 sum of Python builds, and that we should probably do better.
What about sha256? Has it been discussed already?
Bests,
python-committers mailing list -- python-committers@python.org To unsubscribe send an email to python-committers-leave@python.org https://mail.python.org/mailman3/lists/python-committers.python.org/ Message archived at https://mail.python.org/archives/list/python-committers@python.org/message/M... Code of Conduct: https://www.python.org/psf/codeofconduct/
On 16/03/2021 14.59, Julien Palard via python-committers wrote:
Hi,
Someone on Mastodon had me noticed that:
=> https://www.python.org/downloads/release/python-392/
gives the md5 sum of Python builds, and that we should probably do better.
What about sha256? Has it been discussed already?
Hi Julien,
could you please explain your use case? Which problem are you trying to solve? How would a sha256 checksum help you solve that problem?
Christian
Le 2021-03-16 à 15:52, Christian Heimes a écrit :
could you please explain your use case? Which problem are you trying to solve? How would a sha256 checksum help you solve that problem?
No, I'm just forwarding the surprise of a user seen on a random social network (I'm monitoring the python hashtag on mastodon those days).
Feel free to follow-up with the original poster:
=> https://mastodon.technology/@musicmatze/105898597559877474
(mastodon does not need you to have an account on mastodon.technology in particular, any mastodon account will do to interact with him, or ask me to ask him an email in private if you prefer).
On 16/03/2021 16.54, Julien Palard wrote:
Le 2021-03-16 à 15:52, Christian Heimes a écrit :
could you please explain your use case? Which problem are you trying to solve? How would a sha256 checksum help you solve that problem?
No, I'm just forwarding the surprise of a user seen on a random social network (I'm monitoring the python hashtag on mastodon those days).
The MD5 fingerprint is really just a checksum to detect download issues. Any checksum would do the trick, even CRC-32. We could (and should) replace the MD5 fingerprint with SHA-256 or SHA-512 [1].
In our case SHA-256 checksums don't provide any real benefit over MD5. Security and data integrity is provided by TLS / HTTPS and optionally by GPG signatures. The Python source code and checksums are provided by the same server. If an attacker is able to modify the tar ball, then it's likely they can replace the checksum information, too.
tl;dr If you want to check for partial / bad downloads, then MD5 is still OK. If you want to check for compromised files, then simple SHA-256 checksums provide no extra security. GPG signatures are problematic because GPG is awful. Sigstore [2] might become an alternative in the future.
Christian
[1] On modern hardware SHA-512 is up to 50% faster than SHA-256. [2] https://sigstore.dev/
On Tue, Mar 16, 2021 at 9:42 AM Christian Heimes <christian@python.org> wrote:
GPG signatures are problematic because GPG is awful.
What is the problem here? Most of the verification for external downloads, at the moment, seems to be via GPG.
Sigstore [2] might become an alternative in the future.
TIL. Seems very recent - https://security.googleblog.com/2021/03/introducing-sigstore-easy-code-signi...
Thank you, Senthil
On Tue, Mar 16, 2021 at 9:42 AM Christian Heimes <christian@python.org> wrote:
On 16/03/2021 16.54, Julien Palard wrote:
Le 2021-03-16 à 15:52, Christian Heimes a écrit :
could you please explain your use case? Which problem are you trying to solve? How would a sha256 checksum help you solve that problem?
No, I'm just forwarding the surprise of a user seen on a random social network (I'm monitoring the python hashtag on mastodon those days).
The MD5 fingerprint is really just a checksum to detect download issues. Any checksum would do the trick, even CRC-32. We could (and should) replace the MD5 fingerprint with SHA-256 or SHA-512 [1].
In our case SHA-256 checksums don't provide any real benefit over MD5.
The benefit of listing the sha256 for files is that it prevents this question coming up again and again because md5 is old and rightfully on the "never use" list for many people. Even if there are situations where it is fine as an effective improvement over a CRC.
Security and data integrity is provided by TLS / HTTPS and optionally by GPG signatures. The Python source code and checksums are provided by the same server. If an attacker is able to modify the tar ball, then it's likely they can replace the checksum information, too.
People do look at https://python.org/ to get the official checksums of the downloads at a much different time than the tarball they have lying around was downloaded. Hosting them serves as an easy way to check the integrity of what they already got at some previous time.
Lets not let perfect be the enemy of the good here.
What do other things hosting downloads do? I see some that list only sha256. I see others that list both. I don't really care which we do so long as we include something standard not red-flagged as broken due to collisions.
-gps
On Mar 16, 2021, at 16:16, Gregory P. Smith <greg@krypto.org> wrote:
The benefit of listing the sha256 for files is that it prevents this question coming up again and again because md5 is old and rightfully on the "never use" list for many people. Even if there are situations where it is fine as an effective improvement over a CRC.
I agree that the primary reason for making the change is to eliminate these kinds of discussions :) In fact, there has been an open issue on the python.org tracker for some time to do just that; see https://github.com/python/pythondotorg/issues/1227. It will also require co-ordination with release managers as we are the ones who populate the data, if anyone feels motivated to dig into the webside code.
-- Ned Deily nad@python.org -- []
On 16. 03. 21 21:16, Gregory P. Smith wrote:
On Tue, Mar 16, 2021 at 9:42 AM Christian Heimes <christian@python.org <mailto:christian@python.org>> wrote:
On 16/03/2021 16.54, Julien Palard wrote: > Le 2021-03-16 à 15:52, Christian Heimes a écrit : >> could you please explain your use case? Which problem are you trying to >> solve? How would a sha256 checksum help you solve that problem? > > No, I'm just forwarding the surprise of a user seen on a random social > network (I'm monitoring the python hashtag on mastodon those days). The MD5 fingerprint is really just a checksum to detect download issues. Any checksum would do the trick, even CRC-32. We could (and should) replace the MD5 fingerprint with SHA-256 or SHA-512 [1]. In our case SHA-256 checksums don't provide any real benefit over MD5.
The benefit of listing the sha256 for files is that it prevents this question coming up again and again because md5 is old and rightfully on the "never use" list for many people. Even if there are situations where it is fine as an effective improvement over a CRC.
Security and data integrity is provided by TLS / HTTPS and optionally by GPG signatures. The Python source code and checksums are provided by the same server. If an attacker is able to modify the tar ball, then it's likely they can replace the checksum information, too.
People do look at https://python.org/ <https://python.org/> to get the official checksums of the downloads at a much different time than the tarball they have lying around was downloaded. Hosting them serves as an easy way to check the integrity of what they already got at some previous time.
Exactly. I've offered a Flash drive with Python at install parties with slow wi-fi before. Allowing people to easily check against https://python.org would have been nice to have.
Lets not let perfect be the enemy of the good here.
What do other things hosting downloads do? I see some that list only sha256. I see others that list both. I don't really care which we do so long as we include something standard not red-flagged as broken due to collisions.
On Tue, Mar 16, 2021 at 9:16 PM Gregory P. Smith <greg@krypto.org> wrote:
The benefit of listing the sha256 for files is that it prevents this question coming up again and again because md5 is old and rightfully on the "never use" list for many people. Even if there are situations where it is fine as an effective improvement over a CRC.
Would it be possible to provide multiple hashes, like MD5 *and* SHA256 (and maybe also SHA512)? Or is there a practical problem to list multiple hashes on a web page?
Victor
On Mar 17, 2021, at 10:29, Victor Stinner <vstinner@python.org> wrote:
On Tue, Mar 16, 2021 at 9:16 PM Gregory P. Smith <greg@krypto.org> wrote:
The benefit of listing the sha256 for files is that it prevents this question coming up again and again because md5 is old and rightfully on the "never use" list for many people. Even if there are situations where it is fine as an effective improvement over a CRC. Would it be possible to provide multiple hashes, like MD5 *and* SHA256 (and maybe also SHA512)? Or is there a practical problem to list multiple hashes on a web page?
Why would we need to have multiple hashes? One is sufficient. The only issue is that we are set up today to use md5 and changing to another hash takes some work, both to the web site and to how we do releases. It's not a huge amount of work but somebody(ies) need(s) to step up to do it and the only obvious reason for doing it is to stop these discussions. And that hasn't been motivation yet enough given the list of higher priority items.
-- Ned Deily nad@python.org -- []
On Wed, Mar 17, 2021, at 09:29, Victor Stinner wrote:
On Tue, Mar 16, 2021 at 9:16 PM Gregory P. Smith <greg@krypto.org> wrote:
The benefit of listing the sha256 for files is that it prevents this question coming up again and again because md5 is old and rightfully on the "never use" list for many people. Even if there are situations where it is fine as an effective improvement over a CRC.
Would it be possible to provide multiple hashes, like MD5 *and* SHA256 (and maybe also SHA512)? Or is there a practical problem to list multiple hashes on a web page?
How about zero hashes?
On 17.03.2021 18:53, Benjamin Peterson wrote:
On Wed, Mar 17, 2021, at 09:29, Victor Stinner wrote:
On Tue, Mar 16, 2021 at 9:16 PM Gregory P. Smith <greg@krypto.org> wrote:
The benefit of listing the sha256 for files is that it prevents this question coming up again and again because md5 is old and rightfully on the "never use" list for many people. Even if there are situations where it is fine as an effective improvement over a CRC.
Would it be possible to provide multiple hashes, like MD5 *and* SHA256 (and maybe also SHA512)? Or is there a practical problem to list multiple hashes on a web page?
How about zero hashes?
IMO, it would be better to put SHA256SUM files into the download folder of each release (these could be cron generated to not make the release process more difficult), e.g.
https://www.python.org/ftp/python/3.9.2/
These files would then contain all hashes for all files in a directory and together with the sha256sum command provide a nice interface for checking any downloads.
https://linux.die.net/man/1/sha256sum
That said, most of the file formats used for release files already include checks against file corruption. On the plus side, you don't have to run e.g. an .exe to find out.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Mar 17 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
participants (11)
-
Benjamin Peterson
-
Christian Heimes
-
Christian Heimes
-
Gregory P. Smith
-
Julien Palard
-
Karthikeyan
-
M.-A. Lemburg
-
Ned Deily
-
Petr Viktorin
-
Senthil Kumaran
-
Victor Stinner