[Catalog-sig] hash tags

Donald Stufft donald at stufft.io
Sat Mar 9 00:15:13 CET 2013

On Mar 8, 2013, at 5:50 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Fri, Mar 8, 2013 at 4:32 PM, Donald Stufft <donald at stufft.io> wrote:
>> Here's some more information pulled straight from Wikiepdia:
> Trust me, I've read a LOT of Wikipedia (and even more from other
> sites, including at least the conclusions of a number of cryptography
> papers) about hashing attacks recently, because I was seeing
> inconsistencies in what people are saying about hashes and their
> weaknesses and so forth.  99.9% of the discussion about attacks on
> hashes have to do with collision attacks, prefix attacks, and length
> extension attacks, all of which are extremely relevant for
> *cryptographic* purposes.  Specifically, the use of hashes to verify
> identity, authority, repudiability, etc...  which emphatically do
> *not* apply to the use of an MD5 as a checksum to verify a correct
> download.
> All of these attacks depend on *something else* being at stake besides
> the integrity of the original message.  For example length-extension
> attacks bypass the need to know a "secret" used in a naive hash-based
> signature scheme (which is why you're supposed to use HMAC for such
> things), while collision attacks let you trick a signer into signing
> something that you can later replace with something altered.
> The current use of #md5 tags isn't subject to either of these kinds of
> attack, because:
> 1. There is no "secret" to be revealed, and
> 2. The author and signer are the same person
> So the only type of attack I've found out about thus far, in my
> (admittedly few) hours of study on the subject, that is relevant to
> the way we use MD5 on PyPI at present is the so-called "second
> pre-image" attack, which is when you're given an existing message and
> a hash, and have to create a new message with the same hash...  while
> also incorporating something useful in the new message.
> The most recent report I saw on second pre-image attacks against full
> MD5 estimated a 2**127 strength, meaning that even if you could
> process a great many billion tries per second, it would take you
> thousands of years to come up with a file that could masquerade as an
> existing download.  (And most people's computers and/or internet
> connections would choke on the massive file sizes needed for the
> still-theoretical Kelsey-Schneier generalized preimage attack, which
> in any case would apply equally to just about any other hash we could
> currently put out in the field. i.e., it's not specific to a
> particular hash algorithm, it just relies on certain properties of the
> algorithm.)
> So, yeah, MD5 is *cryptographically* broken, sure.  But it's not
> broken for *data integrity*.  And in the PyPI use case, the
> "cryptographic" part is all in the SSL being used to fetch the MD5
> link in the first place.
>> Here's the important highlights:
>>    - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum
> Right, that's what's called a "collision attack".  It means that you
> can go out *ahead of time*, and make two files with the same checksum,
> one good, one evil.  It does *not* mean you get to take an existing
> file, and then make a second file with the same checksum.  (The latter
> is a "second preimage" attack, which is *not* broken
> Hash collision attacks in PyPI would basically require an author to
> upload a special version of their package that looked innocent, and
> then they could later switch that version out with one that's harmful.
> And the *way* that this works is that you specially generate *both*
> files, in advance.  Which means that the author themselves is
> compromised, so the threat is moot.  The author can already upload
> compromised code (either through being evil or having their PC
> hijacked), and what #md5 it has is 100% irrelevant.
> That is, there's nothing stopping an evil author or an author with a
> compromised PC from simply uploading a new file with a new MD5,
> because PyPI will pass it along in exactly the same way.  Changing
> hash algorithms will not affect this threat vector in the slightest.
> Given these facts, it makes no sense to fuss over the hash algorithm
> in current use, since a concurrent goal here is to switch to file
> formats that can be directly signed using, you know, *actual*
> cryptography.  ;-)
> The new .wheel format makes provisions for modern signature
> techniques.  It'd be good if sdists also did.  Then the #md5 tag can
> die a natural death, hopefully within the year replaced by a hashtag
> that say, fingerprints the author's public key as registered with
> PyPI, or something of that sort.  In the meantime, there's no actual
> threat here, so bikeshedding what to replace it with *while keeping
> the current system* is like rearranging office furniture in a building
> that's about to have demolition charges set underneath it.  ;-)


There's an old saying inside the NSA: "Attacks always get better; they never get worse." [1]

Even if you accept the premise that for this one tiny little segment MD5 is still theoretically ok MD5 isn't going to get any better. The simple API is not going anywhere. Waving your hands and saying the stuff will obselete all of this is great but it won't. This stuff is going to be around for a long time and we need to look towards the future not shove our head in the sand and point towards a toolchain that may or may not happen in the near future.

[1] Stolen from Bruce Schneier

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130308/4c3b19db/attachment.pgp>

More information about the Catalog-SIG mailing list