[Distutils] a plea for backward-compatibility / smooth transitions (was: Re: Migrating Hashes from MD5 to SHA256)

holger krekel holger at merlinux.eu
Tue Jul 30 09:56:38 CEST 2013

On Mon, Jul 29, 2013 at 14:30 -0400, Donald Stufft wrote:
> On Jul 29, 2013, at 1:28 PM, holger krekel <holger at merlinux.eu> wrote:
> > On Mon, Jul 29, 2013 at 10:30 -0400, Donald Stufft wrote:
> >> On Jul 29, 2013, at 7:58 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >>>> 
> >>>> Actually, i strongly object further backward-incompatible changes.
> >>>> 
> >>>> Please (generally) find a way to introduce improvements without breaking
> >>>> existing installation processes at the same time.
> >>>> 
> >>>> For example, in this case pip/easy_install could indicate to PYPI what
> >>>> kind of hashes it accepts (through a header or query param or whatever)
> >>>> and PyPI could serve it but we'd default to MD5 for now if nothing else
> >>>> was requested.  Please also consider the PEP438 vetted registration of
> >>>> externals+hashses in this context.  Once things and tools are working
> >>>> nicely we can switch to serving a non-MD5 hash as default after a
> >>>> sufficient grace period.
> >>> 
> >>> Having the improved hashes be opt-in (by the client) strikes me as a
> >>> reasonable request.
> >>> 
> >>> Yes, this means nothing will actually happen until easy_install/pip
> >>> are updated to request those improved hashes and those versions see
> >>> significant uptake, but as Holger says, we need to ensure we put
> >>> sufficient effort into smoothing out the roller coaster ride that has
> >>> been the recent experience of packaging system users.
> >> 
> >> There's basically zero way for this to fail closed in any of the
> >> current installers. The failure mode is unverified packages not
> >> uninstallable packages. I am not aware of a single installer that
> >> mandates the use of a hash. Crate.io has never used md5 hashes and has
> >> always used sha256 and I've never received a single report of an
> >> installer being unable to install because of it, which is exactly what
> >> I expect.
> > 
> > So you think the worst case for forcing SHA256 hashes is that installers
> > who don't yet support sha256 hashes would just ignore it (and thus wouldn't
> > do hash verification)?
> Yes. I've been using sha256 on simple.crate.io for over a year and
> zero people have ever stated it didn't work for them. This also fits
> in with my knowledge of how setuptools and pip works. I know
> zc.buildout less well but to my knowledge they simple allow setuptools
> to handle the downloading.

Sounds good.  Maybe "secondary" tools could get problems, though.
I know for sure that devpi-server might stumble but i can fix that.
Also i remember there were tools that memorized MD5 hashes in requirements
files etc.

> >> Indicating via Header or query param pretty much destroys the
> >> effectiveness of the CDN's cache in order to fix a problem with a
> >> theoretical (as far as I am aware) installer that requires a md5 hash
> >> (and thus has never worked for any of the externally hosted packages.
> >> Additionally it doesn't account for external urls which need to be
> >> registered *with* a hash.
> > 
> > Currently there is no hash-type enforcement on registered externals, is there?
> Registered externals must register with a md5 hash, scraped links and
> download urls etc do not require it because they are indirectly added.
> There is no verification by PyPI that the given hash matches the
> package at the end of the url.

Hum, can we allow submitting multiple hashes?  Are there tools already 
that help with registering externals?

> >> As far as available attacks, *today* an author could upload a package
> >> that has been created so as to have a sister package with malicious
> >> code that has the same hash allowing them to have a malicious package
> >> they can substitute at will without the hashes changing at all. In the
> >> future it's possible that a pre-image attack on MD5 will be found and
> >> then we'll be dealing with this problem then when we've lost all
> >> verification on external urls instead of now when we have time to get
> >> external urls to switch.
> > 
> > So the attack is a malicious author or someone else modifying an external
> > release file (either directly on the server or via MITM) while maintaining
> > the pre-registered MD5 hash, right?
> > 
> > I am currently merely trying to understand more exactly what
> > you are worried about.
> (...)
> MD5 is currently broken for collision resistance. This means that an
> author can generate two packages that hash to the same thing. Once
> package might be benign and one might be malicious. Given those two
> packages people using the md5 hashes will not be able to differentiate
> between the benign and the malicous package.

I think we should not pretend that PyPI has (by itself) any safety belts 
against malicious authors.  There are numerous ways for malicious authors
to do evil if they choose to.  The potential ability to fake a package
using a collision attack merely adds another way.

Do you know, btw, if TUF is going to help with any of what we are discussing
here? (I am again a bit lost as to the roadmap wrt to TUF - is there


> MD5 is currently *not* broken for pre image resistance. This means that as of
> right now someone can not take an already existing package on PyPI and generate
> a second package that hashes to the same thing (besides via brute forcing).
> So right now, collision attacks possible == yes, pre image attacks possible == no.
> However designing secure systems is a practice of building in safety margins. If
> someone, for instance, can break 5 rounds of a function you use 15 rounds. With
> cryptographic hashes collision attacks are easier than pre-image attacks, so if you
> have two functions, one that has a collision attack and one that doesn't you can
> generally assume that the one without a collision attack is stronger and has a
> longer shelf life.
> So the problem with MD5 (ignoring for a second the fact that a collision attack can
> be bad on it's own) is that there are no more safety nets. If it gets broken for a pre-image
> then there's not likely to be any warning (we've already *had* the warning). It will
> just be broken and we will be scrambling to update things then (and hopefully nobody
> gets attacked in the meantime).
> And I do say *if* because as zooko pointed out, it's not a guarantee that MD5 will
> ever lose it's pre-image resistance (which just means that brute forcing is the quickest
> way to generate a hash).
> > 
> > best,
> > holger
> > 
> > 
> >> So by all means I will not migrate us if that's what you want. Old
> >> versions of the installation clients stick around far to long for the
> >> opt in mechanism to be much use. The point of switching was to cover
> >> the existing clients as well to narrow the gap until a new API is
> >> developed.
> >> 
> >> Hopefully no one is relying on these hashes to prevent an
> >> author from maliciously injecting a sister package and hopefully the
> >> strength of MD5 holds and no new research is found that blows it's
> >> pre-image attack residence to pieces.
> >> 
> >> As far as not breaking things goes backwards compatibility has been an
> >> important concern however progress forward *requires* breakage. It is
> >> required because there is a vast array of available ways to have your
> >> package and/or hosting configured many of them horrible practices
> >> which need to be killed. Killing them requires breaking backwards
> >> compatibility. You cite SSL, yes SSL has caused a number of errors for
> >> people mostly related to older versions of OpenSSL being unable to use
> >> a SSL certificate but downloading code you're going to execute over
> >> plaintext isn't just bad, it's downright negligent on the part of the
> >> toolchain. So that was a required breakage.
> >> 
> >> You also mention the pip 1.4 *not* installing pre-releases by default.
> >> Yes that broke a handful of packages Supervisor and pytz being the
> >> major ones that I've seen anyone complain about. It was also known
> >> ahead of time that this was a backwards incompatible change (and it
> >> was noted as such in the release notes). It wasn't a surprising
> >> outcome. The pip developers "drew a line in the sand" to quote Paul
> >> Moore and I expect pip 1.5 where PEP438 becomes default to break even
> >> more packages from people who just haven't bothered to change their
> >> practices until it's forced on them.
> >> 
> >> -----------------
> >> Donald Stufft
> >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> >> 
> > 
> > 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

More information about the Distutils-SIG mailing list