Shelve operations are very slow and create huge files
pyth at devel.trillke.net
Sun Nov 2 22:44:43 CET 2003
Eric Wichterich wrote:
> Hello Pythonistas,
> I use Python shelves to store results from MySQL-Queries (using Python
> for web scripting).
> One script searches the MySQL-database and stores the result, the next
> script reads the shelve again and processes the result. But there is a
> problem: if the second script is called too early, the error "(11,
> 'Resource temporarily unavailable') " occurs.
> So I took a closer look at the file that is generated by the shelf: The
> result-list from MySQL-Query contains 14.600 rows with 7 columns. But,
> the saved file is over 3 MB large and contains over 230.000 lines (!),
> which seems way too much!
> Following statements are used:
> dbase = shelve.open(filename)
> if dbase.has_key(key): #overwrite objects stored with same key
> del dbase[key]
> dbase[key] = object
> Any ideas?
Have you thought of simply using the 'keys' as filenames (perhaps with
some canonical name mangling) and storing the object content as a
pickle? These days filesystems tend to behave a lot like databases and
it might prove to be the fastest solution.
I once did a check with reiserfs on linux, created like one million
directory entries and read random entries afterwards. I was then able
to read a couple of hundred files a second (they only contained a small
number). Ah yes, don't try to run os.listdir on those directories :-)
Another thing: renaming a file is *atomic* across all processes (at
least in POSIX land). This means you can create files, fill them, close
them and then issue a 'rename' operation to the real filename and all
other processes will either see no file or the complete file.
More information about the Python-list