interacting with an updatedb generated data file within python

Nick Craig-Wood nick at craig-wood.com
Thu Apr 16 09:14:10 CEST 2009


birdsong <david.birdsong at gmail.com> wrote:
>  Does anybody have any recommendations on how to interact with the data
>  file that updatedb generates?  I'm running through a file list in
>  sqlite that I want to check against the file system. updatedb is
>  pretty optimized for building an index and storing it, but I see no
>  way to query the db file other than calling locate itself.  This would
>  require me to fork and exec for every single file I want to verify -
>  I'd be better off doing the stat myself in that case, but I'd really
>  rather let updatedb build the index for me.

Hmm..

There are several different implementations of locate and I'm not sure
they all use the same database.  On this ubuntu machine the database
is only root readable also.

$ ls -l /var/lib/mlocate/mlocate.db
-rw-r----- 1 root mlocate 7013090 2009-04-14 07:54 /var/lib/mlocate/mlocate.db

>  I searched high and low for any sort of library that is well suited
>  for reading these data files, but I've found nothing for any language
>  other than the source for locate and updatedb itself.

You can use this to extract the database from the locate database

from subprocess import Popen, PIPE
from time import time

start = time()

all_files = set()
p = Popen(["locate", "*"], stdout=PIPE)
for line in p.stdout:
    path = line[:-1]
    all_files.add(path)

print "Found", len(all_files), "files in", time()-start, "seconds"

This builds a set of all the files on the filesystem and prints

Found 314492 files in 1.152987957 seconds

on my laptop, using about 19 MB total memory

You could easily enough put that into an sqlite table instead of a set().

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list