Shorter checksum than MD5
Mercuro
this at is.invalid
Thu Sep 9 08:13:01 EDT 2004
Paul Rubin wrote:
>
> How about putting a timestamp in each record, so you only have to
> compare the records that have been updated since the last period
> comparison.
>
ok, i will give some more information:
I have a proprietary system, which I can't modify.
But, it uses Foxpro DBF files which I can read.
I have found all the data I want to have in a
MySQL table. (this table will be used to lookop
prices and to find other information about articles)
Since I'm not able to put some timestamps on
changed records, I got the idea to put a checksum
on each record and save it in the MySQL table.
Every night I would 'SELECT' all checksums
together with the artikelnumbers and than compare
it one by one with newly calculated checksums from
the DBF file. Only the changed checksums shall be
'UPDATED' and missing numbers would be 'INSERTED'.
This is the code I have for now:
(I will probably change md5 with crc32)
import sys, os, string, dbfreader, md5
from string import strip
# import MySQL module
import MySQLdb
# connect
db = MySQLdb.connect( .... )
# create a cursor
cursor = db.cursor()
cursor.execute("SELECT ID, md5sum, 0 FROM ARTIKEL;")
resultaat = list(cursor.fetchall())
f = dbfreader.DBFFile("ARTIKEL.DBF")
f.open()
i = 0
while 1:
i += 1
updated = 0
rec=f.get_next_record()
if rec==None:
break
pr_kassa = str(rec["PR_KASSA"])
ID = rec["ID"]
IDs = str(ID)
assortiment =
strip(str(rec["ASSORTIMENT"]))[0:1]
pr_tarief = str(rec["PR_TARIEF"])
status = strip(str(rec["STATUS"]))[0:1]
pr_aank = str(rec["PR_AANK"])
benaming =
string.join(string.split(str(rec["BENAMING"]),
"'"), "\\'")
md5sum = md5.new(pr_kassa + IDs +
assortiment + pr_tarief + status + pr_aank +
benaming).hexdigest()[3:8]
if (i % 100) == 0:
print "record %i: ID %s" % (i, IDs)
# lijst optimaal maken om in te
zoeken make list more optimal to search trough
tmp = resultaat[:90]
resultaat = resultaat[90:]
resultaat.extend(tmp)
if resultaat != None:
for record in resultaat:
if record[0] == ID:
#record[2] = 1
if record[1]!=md5sum:
print "update record (ID:
%s)" % IDs
# update van bestaand record,
md5 sum does not match
cursor.execute("UPDATE
ARTIKEL SET " +
"benaming='%s', status=%s, assortiment='%s',
pr_aank=%s, pr_tarief=%s, pr_kassa=%s, md5sum='%s'
WHERE ID=%s ;" %
(benaming,
status, assortiment, pr_aank, pr_tarief, pr_kassa,
md5sum, IDs))
updated = 1
break
if (updated == 0) & (ID < 8000000):
# nieuw record
print "nieuw record (ID: %s)" % IDs
cursor.execute("INSERT INTO ARTIKEL
(ID, benaming, status, assortiment, pr_aank,
pr_tarief, pr_kassa, md5sum)" +
" VALUES ( %s, '%s', %s, '%s', %s, %s, %s,
'%s', '%s' );" %
(IDs, benaming, status, assortiment, pr_aank,
pr_tarief, pr_kassa, md5sum))
f.close()
#############################################
If anybody has any better ideas, I'm happy to hear
them!
More information about the Python-list
mailing list