How to Buffer Serialized Objects to Disk

Scott McCarty scott.mccarty at gmail.com
Wed Jan 12 17:29:19 EST 2011


Been digging ever since I posted this. I suspected that the response might
be use a database. I am worried I am trying to reinvent the wheel. The
problem is I don't want any dependencies and I also don't need persistence
program runs. I kind of wanted to keep the use of petit very similar to cat,
head, awk, etc. But, that said, I have realized that if I provide the
analysis features as an API, you very well, might want persistence between
runs.

What about using an array inside a shelve?

Just got done messing with this in python shell:

import shelve

d = shelve.open(filename="/root/test.shelf", protocol=-1)

d["log"] = ()
d["log"].append("test1")
d["log"].append("test2")
d["log"].append("test3")

Then, always interacting with d["log"], for example:

for i in d["log"]:
    print i

Thoughts?


I know this won't manage memory, but it will keep the footprint down right?
On Wed, Jan 12, 2011 at 5:04 PM, Peter Otten <__peter__ at web.de> wrote:

> Scott McCarty wrote:
>
> > Sorry to ask this question. I have search the list archives and googled,
> > but I don't even know what words to find what I am looking for, I am just
> > looking for a little kick in the right direction.
> >
> > I have a Python based log analysis program called petit (
> > http://crunchtools.com/petit). I am trying to modify it to manage the
> main
> > object types to and from disk.
> >
> > Essentially, I have one object which is a list of a bunch of "Entry"
> > objects. The Entry objects have date, time, date, etc fields which I use
> > for analysis techniques. At the very beginning I build up the list of
> > objects then would like to start pickling it while building to save
> > memory. I want to be able to process more entries than I have memory.
> With
> > a strait list it looks like I could build from xreadlines(), but once you
> > turn it into a more complex object, I don't quick know where to go.
> >
> > I understand how to pickle the entire data structure, but I need
> something
> > that will manage the memory/disk allocation?  Any thoughts?
>
> You can write multiple pickled objects into a single file:
>
> import cPickle as pickle
>
> def dump(filename, items):
>    with open(filename, "wb") as out:
>        dump = pickle.Pickler(out).dump
>        for item in items:
>            dump(item)
>
> def load(filename):
>    with open(filename, "rb") as instream:
>        load = pickle.Unpickler(instream).load
>        while True:
>            try:
>                item = load()
>            except EOFError:
>                break
>            yield item
>
> if __name__ == "__main__":
>    filename = "tmp.pickle"
>    from collections import namedtuple
>    T = namedtuple("T", "alpha beta")
>    dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
>    for item in load(filename):
>        print item
>
> To get random access you'd have to maintain a list containing the offsets
> of
> the entries in the file.
> However, a simple database like SQLite is probably sufficient for the kind
> of entries you have in mind, and it allows operations like aggregation,
> sorting and grouping out of the box.
>
> Peter
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110112/392ace98/attachment.html>


More information about the Python-list mailing list