add hash algorithm agility to RECORD
Proposed edits to https://bitbucket.org/dholth/python-peps/changeset/9c26fa50 In wheel I use urlsafe_b64encode_nopad() which omits the trailing = characters, but although very easy to implement isn't included in the stdlib. In this spec I use the stdlib urlsafe_b64encode(). diff -r 23f9640c2020 -r 9c26fa508424 pep-0376.txt --- a/pep-0376.txt Thu Sep 06 12:09:58 2012 -0400 +++ b/pep-0376.txt Thu Sep 06 12:24:40 2012 -0400 @@ -218,11 +218,16 @@ - an absolute path, using the local platform separator -- the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo` - generated files don't have any hash because they are automatically produced - from `py` files. So checking the hash of the corresponding `py` file is - enough to decide if the file and its associated `pyc` or `pyo` files have - changed. +- a hash of the file's contents. + Notice that `pyc` and `pyo` generated files don't have any hash because + they are automatically produced from `py` files. So checking the hash + of the corresponding `py` file is enough to decide if the file and + its associated `pyc` or `pyo` files have changed. + + The hash is either the empty string, the **MD5** hash of + the file, encoded in hex, or the hash algorithm as named in + ``hashlib.algorithms``, followed by the equals character ``=``, + followed by the hash digest as encoded with ``urlsafe_b64encode``. - the file's size in bytes @@ -391,9 +396,9 @@ And following methods: -- ``get_installed_files(local=False)`` -> iterator of (path, md5, size) +- ``get_installed_files(local=False)`` -> iterator of (path, hash, size) - Iterates over the `RECORD` entries and return a tuple ``(path, md5, size)`` + Iterates over the `RECORD` entries and return a tuple ``(path, hash, size)`` for each line. If ``local`` is ``True``, the path is transformed into a local absolute path. Otherwise the raw value from `RECORD` is returned.
On 6 September 2012 17:34, Daniel Holth <dholth@gmail.com> wrote:
Proposed edits to https://bitbucket.org/dholth/python-peps/changeset/9c26fa50
In wheel I use urlsafe_b64encode_nopad() which omits the trailing = characters, but although very easy to implement isn't included in the stdlib. In this spec I use the stdlib urlsafe_b64encode().
Why urlsafe_b64encode, rather than just hexdigest which is what is used for md5? Paul.
It's shorter, and it's used extensively in the digital signature format I'm using. On Sep 7, 2012 6:58 AM, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 6 September 2012 17:34, Daniel Holth <dholth@gmail.com> wrote:
Proposed edits to https://bitbucket.org/dholth/python-peps/changeset/9c26fa50
In wheel I use urlsafe_b64encode_nopad() which omits the trailing = characters, but although very easy to implement isn't included in the stdlib. In this spec I use the stdlib urlsafe_b64encode().
Why urlsafe_b64encode, rather than just hexdigest which is what is used for md5?
Paul.
On Fri, Sep 7, 2012 at 6:59 AM, Daniel Holth <dholth@gmail.com> wrote:
It's shorter, and it's used extensively in the digital signature format I'm using.
On Sep 7, 2012 6:58 AM, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 6 September 2012 17:34, Daniel Holth <dholth@gmail.com> wrote:
Proposed edits to https://bitbucket.org/dholth/python-peps/changeset/9c26fa50
In wheel I use urlsafe_b64encode_nopad() which omits the trailing = characters, but although very easy to implement isn't included in the stdlib. In this spec I use the stdlib urlsafe_b64encode().
Why urlsafe_b64encode, rather than just hexdigest which is what is used for md5?
Paul.
On the other hand, why encode it at all? CSV supports shoving the raw bytes in there, no problem. ;-)
On 7 September 2012 18:58, Daniel Holth <dholth@gmail.com> wrote:
On Fri, Sep 7, 2012 at 6:59 AM, Daniel Holth <dholth@gmail.com> wrote:
It's shorter, and it's used extensively in the digital signature format I'm using.
On the other hand, why encode it at all? CSV supports shoving the raw bytes in there, no problem. ;-)
CSV is text format. Feed it binary at your own peril :-) I've just spent most of the day fighting a (non-Python) system that is going barmy because of line ending funnies in data within CSV fields. That was bad enough - I have no intention of ever having to deal with a CSV file with binary data in it... Paul.
OK. Making MD5 optional in RECORD doesn't seem to be very controversial any more, and it will make life easier for an entire class of systems that compile md5() to produce a crash instead of a message digest. It will not lull anyone into a false sense of security or degrade performance. The edit now suggests the installer pick from one of hashlib.algorithms_guaranteed, currently: {'sha1', 'sha224', 'sha384', 'sha256', 'sha512', 'md5'} All valid arguments to hashlib.new(). The hash value is now either empty, the md5 hexdigest, or the name of the hash, followed by =, followed by the urlsafe-b64encode-nopad (base64 with trailing = removed) of the digest. Does an uninstaller exist that checks the hashes during uninstall? Daniel Holth
participants (2)
-
Daniel Holth
-
Paul Moore