my computer is allergic to pickles

Bob Fnord bob at example.com
Mon Mar 7 04:50:26 EST 2011


MRAB <python at mrabarnett.plus.com> wrote:

> On 05/03/2011 01:56, Bob Fnord wrote:
> > I'm using python to do some log file analysis and I need to store
> > on disk a very large dict with tuples of strings as keys and
> > lists of strings and numbers as values.
> >
> > I started by using cPickle to save the instance of the class that
> > contained this dict, but the pickling process started to write
> > the file but ate so much memory that my computer (4 GB RAM)
> > crashed so badly that I had to press the reset button. I've never
> > seen out-of-memory errors do this before. Is this normal?
> >
> > (I know from the output that got written before the crash that my
> > program had finished building the dict and started the
> > pickle. When I tried running the other program that reads the
> > pickle and analyzes the data in it, it gave an error because the
> > file was incomplete. So I know where in my code the crash
> > happened.)
> >
> >> From searching the web, I get the impression that pickle uses a
> > lot of memory because it checked for recursion and other things
> > that could break other serialization methods. So I've switched to
> > using marshal to save the dict itself (the only persistent thing
> > in the class, which just has convenience methods for adding data
> > to the dict and searching it for the second stage of analysis).
> >
> > I found some references to h5 tables for getting around the
> > pickling memory problem, but I got the impression they only work
> > with fixed columns, not a somewhat complex data structure like
> > mine.
> >
> > Any comments, suggestions?
> >
> Would a database work?

I want a portable data file (can be moved around the filesystem
or copied to another machine and used), so I don't want to use
mysql or postgres. I guess the "sqlite" approach would work, but
I think it would be difficult to turn the tuples of strings and
lists of strings and numbers into database table lines. 

Would a database in a file have any advantages over a file made
by marshal or shelve?

I'm more worried about the fact that a python program in user
space can bring down the computer!




More information about the Python-list mailing list