How to Buffer Serialized Objects to Disk

Chris Rebert clp2 at rebertia.com
Wed Jan 12 16:41:32 EST 2011


On Wed, Jan 12, 2011 at 1:05 PM, Scott McCarty <scott.mccarty at gmail.com> wrote:
> Sorry to ask this question. I have search the list archives and googled, but
> I don't even know what words to find what I am looking for, I am just
> looking for a little kick in the right direction.
> I have a Python based log analysis program called petit
> (http://crunchtools.com/petit). I am trying to modify it to manage the main
> object types to and from disk.
> Essentially, I have one object which is a list of a bunch of "Entry"
> objects. The Entry objects have date, time, date, etc fields which I use for
> analysis techniques. At the very beginning I build up the list of objects
> then would like to start pickling it while building to save memory. I want
> to be able to process more entries than I have memory. With a strait list it
> looks like I could build from xreadlines(), but once you turn it into a more
> complex object, I don't quick know where to go.
> I understand how to pickle the entire data structure, but I need something
> that will manage the memory/disk allocation?  Any thoughts?

You could subclass `list` and use sys.getsizeof()
[http://docs.python.org/library/sys.html#sys.getsizeof ] to keep track
of the size of the elements, and then start pickling them to disk once
the total size reaches some preset limit.
But like MRAB said, using a proper database, e.g. SQLite
(http://docs.python.org/library/sqlite3.html ), wouldn't be a bad idea
either.

Cheers,
Chris
--
http://blog.rebertia.com



More information about the Python-list mailing list