Shelve operations are very slow and create huge files

Sun Nov 2 12:28:35 EST 2003

Hello Tim,

thank you for your thoughts. I tried to use cPickle and gzip instead of 
shelve. But it ran much slower than before.
So I used the profiler to check where the most time is needed.
To read the data and convert it back to a dictionary needed around 9 
seconds with shelve.
With cPickle, it even needed 11 seconds.
With gzip & cPickle, it also needed 11 seconds (the file was now around 
250 kB instead of 1,7 MB).
Using pickle instead of cPickle, it needed over 45 seconds.
It seems that the file size doesn't matter at all. But 9 seconds just 
for converting data from a pickle back to a Python dictionary???

Thanks also for mentioning DMTools. Do you know whether it is useful 
for (fast) convertion from a pickle (or some file stored on the server) 
into a Python dictionary? I didn't find much real-life examples or 
further descriptions of these tools.

Greetings,
Eric

Am Sonntag, 02.11.03 um 08:36 Uhr schrieb Tim Churches:

> On Sun, 2003-11-02 at 03:38, Eric Wichterich wrote:
>> One script searches the MySQL-database and stores the result, the next
>> script reads the shelve again and processes the result. But there is a
>> problem: if the second script is called too early, the error "(11,
>> 'Resource temporarily unavailable') " occurs.
>
> The only reason to use shelves is if your query results are too large
> (in total) to fit in memory, and thus have to be retrieved, stored and
> processed row-by-row.
>
>> So I took a closer look at the file that is generated by the shelf: 
>> The
>> result-list from MySQL-Query contains 14.600 rows with 7 columns. But,
>> the saved file is over 3 MB large and contains over 230.000 lines (!),
>> which seems way too much!
>
> But that doesn't seem to be the case - your query results can easily 
> fit
> in memory. However, the query may still take a long time to execute, so
> it may be reasonable to want to store or cache the results for further
> processing later. However, it is much quicker to just pickle (cPickle)
> the results to a gzipped file than to use shelve. The use of gzip
> actually speeds things up, provided that your CPU is reasonably fast 
> and
> your disc storage system is mundane (any CPU faster than about 500 Mhz
> sees gains on most result sets). Also it saves disc space.
>
> Ole Nielsen and Peter Christen have written a neat set of Python
> functions which will automatically handle the caching of query results
> from an MySQL datadase in gzipped pickles - see
> http://csl.anu.edu.au/ml/dm/dm_software.html - except the files don't
> seem to be available from that page - Ole and Peter, please fix!
>
> -- 
>
> Tim C
>
> PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
> or at http://members.optushome.com.au/tchur/pubkey.asc
> Key fingerprint = 8C22 BF76 33BA B3B5 1D5B  EB37 7891 46A9 EAF9 93D0
>
>
> <signature.asc>