pytables - best practices / mem leaks

py_genetic conor.robinson at gmail.com
Tue Jul 18 16:45:27 EDT 2006


py_genetic wrote:
> I have an H5 file with one group (off the root) and two large main
> tables and I'm attempting to aggragate my data into 50+ new groups (off
> the root) with two tables per sub group.
>
> sys info:
> PyTables version:  1.3.2
> HDF5 version:      1.6.5
> numarray version:  1.5.0
> Zlib version:      1.2.3
> BZIP2 version:     1.0.3 (15-Feb-2005)
> Python version:    2.4.2 (#1, Jul 13 2006, 20:16:08)
> [GCC 4.0.1 (Apple Computer, Inc. build 5250)]
> Platform:          darwin-Power Macintosh (v10.4.7)
> Byte-ordering:     big
>
> Ran all pytables tests included with package and recieved an OK.
>
>
> Using the following code I get one of three errors:
>
> 1. Illegal Instruction
>
> 2. Malloc(): trying to call free() twice
>
> 3. Bus Error
>
> I believe all three stem from the same issue, involving a malloc()
> memory problem in the pytable c libraries.  I also believe this may be
> due to how I'm attempting to write my sorting script.
>
> The script executes fine and all goes well until I'm sorting about
> group 20 to 30 and I throw one of the three above errors depending on
> how/when I'm flush() close() the file.  When I open the file after the
> error using h5ls all tables are in perfact order up to the crash and if
> I continue from the point every thing runs fine until python throws the
> same error again after another 10 sorts or so.  The somewhat random
> crashing is what leads me to believe I have a memory leak or my method
> of doing this is incorrect.
>
> Is there a better way to aggragate data using pytables/python? Is there
> a better way to be doing this?  This seems strait forward enough.
>
> Thanks,
> Conor
>
> #function to agg state data from main neg/pos tables into neg/pos state
> tables
>
> import string
> import tables
>
>
> def aggstate(state, h5file):
>
> 	print state
>
> 	class PosRecords(tables.IsDescription):
> 		sic = tables.IntCol(0, 1, 4, 0, None, 0)
> 		numsic = tables.IntCol(0, 1, 4, 0, None, 0)
> 		empsiz = tables.StringCol(1, '?', 1, None, 0)
> 		salvol = tables.StringCol(1, '?', 1, None, 0)
> 		popcod = tables.StringCol(1, '?', 1, None, 0)
> 		state = tables.StringCol(2, '?', 1, None, 0)
> 		zip = tables.IntCol(0, 1, 4, 0, None, 1)
>
> 	class NegRecords(tables.IsDescription):
> 		sic = tables.IntCol(0, 1, 4, 0, None, 0)
> 		numsic = tables.IntCol(0, 1, 4, 0, None, 0)
> 		empsiz = tables.StringCol(1, '?', 1, None, 0)
> 		salvol = tables.StringCol(1, '?', 1, None, 0)
> 		popcod = tables.StringCol(1, '?', 1, None, 0)
> 		state = tables.StringCol(2, '?', 1, None, 0)
> 		zip = tables.IntCol(0, 1, 4, 0, None, 1)
>
>
>
> 	group1 = h5file.createGroup("/", state+"_raw_records", state+" raw
> records")
>
> 	table1 = h5file.createTable(group1, "pos_records", PosRecords, state+"
> raw pos record table")
> 	table2 = h5file.createTable(group1, "neg_records", NegRecords, state+"
> raw neg record table")
>
> 	table = h5file.root.raw_records.pos_records
> 	point = table1.row
> 	for x in table.iterrows():
> 		if x['state'] == state:
> 				point['sic'] = x['sic']
> 				point['numsic'] = x['numsic']
> 				point['empsiz'] = x['empsiz']
> 				point['salvol'] = x['salvol']
> 				point['popcod'] = x['popcod']
> 				point['state'] = x['state']
> 				point['zip'] = x['zip']
>
> 				point.append()
>
> 	h5file.flush()
>
> 	table = h5file.root.raw_records.neg_records
> 	point = table2.row
> 	for x in table.iterrows():
> 		if x['state'] == state:
> 				point['sic'] = x['sic']
> 				point['numsic'] = x['numsic']
> 				point['empsiz'] = x['empsiz']
> 				point['salvol'] = x['salvol']
> 				point['popcod'] = x['popcod']
> 				point['state'] = x['state']
> 				point['zip'] = x['zip']
>
> 				point.append()
>
>
> 	h5file.flush()
>
>
>
> states =
> ['AL','AK','AZ','AR','CA','CO','CT','DC','DE','FL','GA','HI','ID','IL','IN','IA','KS','KY','LA','ME','MD','MA','MI','MN','MS','MO','MT','NE','NV','NH','NJ','NM','NY','NC','ND','OH','OK','OR','PA','RI','SC','SD','TN','TX','UT','VT','VA','WA','WV','WI','WY']
>
> h5file = tables.openFile("200309_data.h5", mode = 'a')
>
> for i in xrange(len(states)):
> 	aggstate(states[i], h5file)
>
> h5file.close()

The problem with my above posting is that h5file.flush() should be
table.flush() (flush the table not the whole object) although
h5file.flush() is an actual method I don't believe it correctly writes
to the tables, it causes all types of issues as time goes on and I
think overlaps .close() causing more issues.  I also flushed the table1
and table2 after I created the new group and table1 and table2 each
iteration, things are stable now, pytables is great.




More information about the Python-list mailing list