Shorter checksum than MD5

Paul Rubin http
Thu Sep 9 21:12:05 CEST 2004

Mercuro <this at is.invalid> writes:
> I have a proprietary system, which I can't modify. But, it uses Foxpro
> DBF files which I can read. I have found all the data I want to have
> in a MySQL table. (this table will be used to lookop prices and to
> find other information about articles)
> Since I'm not able to put some timestamps on changed records, I got
> the idea to put a checksum on each record and save it in the MySQL
> table. Every night I would 'SELECT' all checksums together with the
> artikelnumbers and than compare it one by one with newly calculated
> checksums from the DBF file.  Only the changed checksums shall be
> 'UPDATED' and missing numbers would be 'INSERTED'.

I'm a little confused.  Is only the DBF file getting updated?  If you
can put a checksum on each record, why can't you put a timestamp on
each record?  Or why can't you just migrate all the data from the DBF
into another file every night, and then just scan the file to find the
changes from the previous night's version?

> This is the code I have for now:
> (I will probably change md5 with crc32)

Where are the updates coming from?  Note that if you use a 32-bit
checksum, with 100000 records you will probably have some records with
the same checksum by accident.  Is that a problem?  Also, with CRC32,
it's very easy to create a record on purpose that has any given
checksum.  Is THAT a problem?  For example, it means that if someone
can change the price of an article, he can choose a new price so that
the record will have the same checksum as the old price and the change
won't get noticed.  Could he buy something for $1.00, change the price
to $11.73 or something, then return the item and get an $11.73 refund
because you didn't notice the update?

More information about the Python-list mailing list