BUG? sha-moduel returns same crc for different files

Erno Kuusela erno-news at erno.iki.fi
Mon Sep 18 11:43:10 EDT 2000


>>>>> "Treutwein" == Treutwein Guido <Guido.Treutwein at nbg.siemens.de> writes:

    Treutwein> c) While the probabilty of a file, to have a certain
    Treutwein> hash code is 2^(-hash_bitlength), the probability of
    Treutwein> finding two files with the same hash value is MUCH
    Treutwein> bigger; this is the so-called birthday paradox. (due to
    Treutwein> the fact, that having 23 persons in a room, the
    Treutwein> probabilty of having to with the same birthday is
    Treutwein> better than 50%; for a 32bit-CRC the corresponding
    Treutwein> limit is about 77000 files for a 50% chance).

2^32 is much smaller than 2^160 (2^128 times smaller infact). how many
files would be needed for there to be a 50% change of a sha-1 hash
collision? (how is it calculated?)  2^160 is
1461501637330902918203684832716283019655932542976...


    Treutwein> For this reason, standardization bodies move towards
    Treutwein> larger hash sizes like 256 bit.

which standards/hashes?
(not that i disbelieve you; i don't know a lot about cryptography and
i'm curious.)

    Treutwein>  If you don't have the time to write a hash function as
    Treutwein> C extension package consider to use a combination of
    Treutwein> sha-1, md-5, file size and crc32

since there are 4294967296 times more possible values for sha-1
than for md5, methinks this would not make much difference.

  -- erno



More information about the Python-list mailing list