sync databse table based on current directory data without losign previous values
Νίκος Γκρ33κ
nikos.gr33k at gmail.com
Wed Mar 6 03:57:26 EST 2013
Τη Τετάρτη, 6 Μαρτίου 2013 10:19:06 π.μ. UTC+2, ο χρήστης Lele Gaifax έγραψε:
> Νίκος Γκρ33κ <nikos.gr33k at gmail.com> writes:
>
>
>
> > How can i update the databse to only contain the existing filenames without losing the previous stored data?
>
>
>
> Basically you need to keep a list (or better, a set) containing all
>
> current filenames that you are going to insert, and finally do another
>
> "inverse" loop where you scan all the records and delete those that are
>
> not present anymore.
>
>
>
> Of course, this assume you have a "bidirectional" identity between the
>
> filenames you are loading and the records you are inserting, which is
>
> not the case in the code you show:
>
>
>
> > #read the containing folder and insert new filenames
>
> > for result in os.walk(path):
>
> > for filename in result[2]:
>
>
>
> Here "filename" is just that, not the full path: this could result in
>
> collisions, if your are actually loading a *tree* instead of a flat
>
> directory, that is multiple source files are squeezed into a single
>
> record in your database (imagine "/foo/index.html" and
>
> "/foo/subdir/index.html").
>
>
>
> With that in mind, I would do something like the following:
>
>
>
> # Compute a set of current fullpaths
>
> current_fullpaths = set()
>
> for root, dirs, files in os.walk(path):
>
> for fullpath in files:
>
> current_fullpaths.add(os.path.join(root, file))
>
>
>
> # Load'em
>
> for fullpath in current_fullpaths:
>
>
>
> try:
>
> #find the needed counter for the page URL
>
> cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) )
>
> data = cur.fetchone() #URL is unique, so should only be one
>
>
>
> if not data:
>
> #first time for file; primary key is automatic, hit is defaulted
>
> cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
>
> except MySQLdb.Error, e:
>
> print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
>
>
>
> # Delete spurious
>
> cur.execute('''SELECT url FROM files''')
>
> for rec in cur:
>
> fullpath = rec[0]
>
> if fullpath not in current_fullpaths:
>
> other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))
>
>
>
> Of course here I am assuming a lot (a typical thing we do to answer your
>
> questions :-), in particular that the "url" field content matches the
>
> filesystem layout, which may not be the case. Adapt it to your usecase.
>
>
>
> hope this helps,
>
> ciao, lele.
>
> --
>
> nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
>
> real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
>
> lele at metapensiero.it | -- Fortunato Depero, 1929.
You are fantastic! Your straightforward logic amazes me!
Thank you very much for making things clear to me!!
But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:
http://superhost.gr/cgi-bin/files.py
More information about the Python-list
mailing list