three column dataset - additions and deletions

John Nagle nagle at animats.com
Fri Dec 3 00:02:29 EST 2010


On 12/2/2010 5:06 PM, draeath wrote:
> On Thu, 02 Dec 2010 22:55:53 +0000, Tim Harig wrote:
>
> Thanks for taking the time to check in on this, Tim!

> I realize this could likely all be done from inside the database itself -
> but altering the DB itself is not an option (as the product vendor is
> very touchy about that, and altering it can null our support agreement)

    A local database is probably the way to go.  You're already using
MySQL, so you know how to do that.  You can use MySQL or SQlite on
a local machine machine for your local database, while also talking
to the remote MySQL database.

    Locally, you probably want to store the key, the short string,
the MD5 of the long string, and the long string.  When you get an
update, put it in a temporary table, then compare that table with
your permanent table.  (The comparison is one line of SQL.)
What you do with the differences is your problem.

    I have a system running which does something like this. Every
three hours, it fetches PhishTank's database of a few hundred
thousand phishign sites, and compares it to my local copy.
Another system of mine reads the daily updates to SEC filings
and updates my local database.  This is all routine database
stuff.

    If you have to work with big, persistent data sets, use a
real database. That's what they are for, and they already have
good algorithms for the hard stuff.  Storing some local data
structure with "pickle" is probably not the right approach.

    					John Nagle



More information about the Python-list mailing list