three column dataset - additions and deletions

draeath draeath.spamtrap at gmail.com
Thu Dec 2 17:44:01 EST 2010


I'm going to be writing a utility that will be pulling three fields from 
a MySQL table. I've already got a sample dataset - there's a long int 
(which is a db key), a short string, and a looong string. Many rows.

As it is, receive this data from the DB interface as a rather large tuple 
of tuples. I plan on hashing the long string field (both for convenience 
and security) and storing the set in a pickle.

The idea is that this script will run periodically, pulling the table, 
and comparing the data gathered at that run to that stored by the 
previous, acting on changes made, and storing the current data back (to 
be referenced against in the next invocation)

I figure it will be easy enough to determine changed hashes for a given 
key. What I'm unclear on is what the best type of structure to keep this 
data in, given that I need to modify the data after it comes in 
(replacing that long string with, say, an MD5 from hashlib) and both need 
to act on "new" rows (rows that don't exist in the 'old' data) and 
deleted rows (rows that only exist in the 'old' data).

Keeping in mind that I'm a newbie here, and I'm probably not aware of 
most of the different ways to store such things. I shouldn't have any 
problems with the logic itself - I just know enough to know I don't know 
the best ways of doing things :)

Any suggestions? I'm not asking for code or handholding, but some objects 
or datatypes to look into would be very helpful at this early stage.

Thanks!



More information about the Python-list mailing list