How to Buffer Serialized Objects to Disk

Peter Otten __peter__ at web.de
Wed Jan 12 17:04:52 EST 2011


Scott McCarty wrote:

> Sorry to ask this question. I have search the list archives and googled,
> but I don't even know what words to find what I am looking for, I am just
> looking for a little kick in the right direction.
> 
> I have a Python based log analysis program called petit (
> http://crunchtools.com/petit). I am trying to modify it to manage the main
> object types to and from disk.
> 
> Essentially, I have one object which is a list of a bunch of "Entry"
> objects. The Entry objects have date, time, date, etc fields which I use
> for analysis techniques. At the very beginning I build up the list of
> objects then would like to start pickling it while building to save
> memory. I want to be able to process more entries than I have memory. With
> a strait list it looks like I could build from xreadlines(), but once you
> turn it into a more complex object, I don't quick know where to go.
> 
> I understand how to pickle the entire data structure, but I need something
> that will manage the memory/disk allocation?  Any thoughts?

You can write multiple pickled objects into a single file:

import cPickle as pickle

def dump(filename, items):
    with open(filename, "wb") as out:
        dump = pickle.Pickler(out).dump
        for item in items:
            dump(item)

def load(filename):
    with open(filename, "rb") as instream:
        load = pickle.Unpickler(instream).load
        while True:
            try:
                item = load()
            except EOFError:
                break
            yield item

if __name__ == "__main__":
    filename = "tmp.pickle"
    from collections import namedtuple
    T = namedtuple("T", "alpha beta")
    dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
    for item in load(filename):
        print item

To get random access you'd have to maintain a list containing the offsets of 
the entries in the file.
However, a simple database like SQLite is probably sufficient for the kind 
of entries you have in mind, and it allows operations like aggregation, 
sorting and grouping out of the box.

Peter




More information about the Python-list mailing list