interacting with an updatedb generated data file within python
nick at craig-wood.com
Thu Apr 16 09:14:10 CEST 2009
birdsong <david.birdsong at gmail.com> wrote:
> Does anybody have any recommendations on how to interact with the data
> file that updatedb generates? I'm running through a file list in
> sqlite that I want to check against the file system. updatedb is
> pretty optimized for building an index and storing it, but I see no
> way to query the db file other than calling locate itself. This would
> require me to fork and exec for every single file I want to verify -
> I'd be better off doing the stat myself in that case, but I'd really
> rather let updatedb build the index for me.
There are several different implementations of locate and I'm not sure
they all use the same database. On this ubuntu machine the database
is only root readable also.
$ ls -l /var/lib/mlocate/mlocate.db
-rw-r----- 1 root mlocate 7013090 2009-04-14 07:54 /var/lib/mlocate/mlocate.db
> I searched high and low for any sort of library that is well suited
> for reading these data files, but I've found nothing for any language
> other than the source for locate and updatedb itself.
You can use this to extract the database from the locate database
from subprocess import Popen, PIPE
from time import time
start = time()
all_files = set()
p = Popen(["locate", "*"], stdout=PIPE)
for line in p.stdout:
path = line[:-1]
print "Found", len(all_files), "files in", time()-start, "seconds"
This builds a set of all the files on the filesystem and prints
Found 314492 files in 1.152987957 seconds
on my laptop, using about 19 MB total memory
You could easily enough put that into an sqlite table instead of a set().
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
More information about the Python-list