Shelve problem: why so big?

Geert-Jan Giezeman geert at cs.uu.nl
Tue Nov 13 04:49:32 EST 2001


In <9sl98i$bjs$1 at nntp9.atl.mindspring.net> "Matt Gerrans" <mgerrans at ix.netcom.com> writes:

>> I have an idea, but considering the fact that I just learned python
>> yesterday, it probably won't help...
>
>That comment made me chuckle.   (actually, it is pretty impressive that you
>learned Python yesterday and are already answering questions -- you must be a
>quick learner!)

Well, this is my first serious python program, so all help is appreciated.

>As for the original post, my personal opinion is that with this much data, it
>is time to consider using a database.   Or at the very least, you should come
>up with a scheme to manage and partition in to multiple smaller chunks.   Even
>if it worked with shelve module, wouldn't the performance be pretty awful?
>
>The PyWin database modules are pretty handy and the dbm module can be used on
>Unix platforms.   I don't know about other platforms, though.   There might be
>even better database access available...
>
>Even if you don't use a database, if all you are storing is a bunch of floats,
>you don't even need shelve.   It is easy enough to write them to (and read
>from) a file on your own with a loop, isn't it?


The performance of shelves was pretty good. In fact, that was a reason why
I used them. I use the shelve in a cgi script in which I need just one of
the 1100 lists, depending on a parameter. I would rather not read all data
in. Of course, I could have solved this by having 1100 files, one for each
list, but I did not take that road initially (and it worked for smaller
values).

What does also work, I discovered, is using a btree (bsddb.btopen) instead
of a shelve (a hash table). I used marshal (instead of pickle) to get
smaller strings. This leads to a file of only 47 MB (even containing two
extra strings with every float).


Thanks everyone who provided answers.



More information about the Python-list mailing list