binary file compare...
Adam Olsen
rhamph at gmail.com
Thu Apr 16 21:03:43 CEST 2009
On Apr 16, 8:59 am, Grant Edwards <invalid at invalid> wrote:
> On 2009-04-16, Adam Olsen <rha... at gmail.com> wrote:
> > I'm afraid you will need to back up your claims with real files.
> > Although MD5 is a smaller, older hash (128 bits, so you only need
> > 2**64 files to find collisions),
>
> You don't need quite that many to have a significant chance of
> a collision. With "only" something on the order of 2**61
> files, you still have about a 1% chance of a collision.
Aye, 2**64 is more of the middle of the curve or so. You can still go
either way. What's important is the order of magnitude required.
> For "a few million files" (we'll say 4e6), the probability of a
> collision is so close to 0 that it can't be calculated using
> double-precision IEEE floats.
≈ 0.000000000000000000000000023509887
Or 42535296000000000000000000 to 1.
Or 42 trillion trillion to 1.
> Here's the Python function I'm using:
>
> def bp(n, d):
> return 1.0 - exp(-n*(n-1.)/(2.*d))
>
> I haven't spent much time studying the numerical issues of the
> way that the exponent is calculated, so I'm not entirely
> confident in the results for "small" n values such that
> p(n) == 0.0.
Try using Qalculate. I always resort to it for things like this.
More information about the Python-list
mailing list