[SciPy-user] Fast saving/loading of huge matrices

Pauli Virtanen pav at iki.fi
Fri Apr 20 12:43:36 EDT 2007


Fri, 20 Apr 2007 08:24:20 +0200, Gael Varoquaux kirjoitti:
>
> I agree that pytable lack a really simple interface. Say something that
> dumps a dic to an hdf5 file, and vice-versa (althought hdf5 -> dic is a
> bit harder as all the hdf5 types may not convert nicely to python types).

In a different attempt to make storing stuff in Pytables easier,
I wrote a library to dump and load any objects directly to HDF5 files

	http://www.iki.fi/pav/software/hdf5pickle/index.html

It uses the pickle protocol to interface with Python, but unrolls
objects so that they are stored in the "native" Pytables formats, if
possible, instead of pickled strings.

It's a bit rought around some edges and a bit slow, but works. (Also, all
security issues associated with pickling should be remembered...)

For example:

	import numpy as N
	import hdf5pickle, tables

	class Foo(object):
	    def __init__(self, c):
	        self.a = array([1,2,3,4,5], float)
	        self.b = 12345
	        self.c = c

	foo = Foo(array([[1+2j, 3+4j]]))

	f = tables.openFile('test.h5', 'w')
	hdf5pickle.dump(foo, f, '/foo')
	f.close()

	f = tables.openFile('test.h5', 'r')
	foo2 = hdf5pickle.load(f, '/foo')
	f.close()

	assert N.all(foo.a == foo2.a)
	assert N.all(foo.b == foo2.b)

... meanwhile, in the shell ...

	$ h5ls -dvr test.h5
	/foo                     Group
	/foo/__                  Group
	/foo/__/args             Dataset {1}
	    Data:
	        (0) 0
	/foo/__/cls              Dataset {12}
	    Data:
	        (0) 95, 95, 109, 97, 105, 110, 95, 95, 10, 70, 111, 111
	/foo/a                   Dataset {5}
	    Data:
	        (0) 1, 2, 3, 4, 5
	/foo/b                   Dataset {SCALAR}
	    Data:
	        (0) 12345
	/foo/c                   Dataset {1, 2}
	    Data:
	        (0,0) {1, 2}, {3, 4}

-- 
Pauli Virtanen




More information about the SciPy-User mailing list